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BACKGROUND OF THE INVENTION 

Obtaining genotype information on thousands of polymorphic markers in a 
highly parallel fashion is becoming an increasingly important task in mapping disease 
loci, in identifying quantitative trait loci, in diagnosing tumor loss of heterozygosity, 
and in performing linkage studies. A currently available method for simultaneously 
obtaining large numbers of polymorphic marker genotypes involves hybridization to 
allele specific probes on high density oligonucleotide arrays. In order to practice the 
method, redundant sets of hybridization probes, typically twenty or more, are used to 
score each marker. A high degree of redundancy is required, however, to reduce the 
noise and achieve an acceptable level of accuracy. Even this level of redundancy is 
often insufficient to unambiguously score heterozygotes or to quantitatively determine 
allele frequency in a population. Thus, there is a need in the art for more reliable and 
better quantitative methods to identify genotypes at polymorphic markers. 
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SUMMARY OF THE INVENTION 

An arrav of oligonucleotide tags attached to a solid substrate ,s disclosed, along 
with locus-specific tagged oligonucleotides. The array and the locus-specific tagged 
oliaonucleotides are particularly useful in genotyping using single base extenston 
5 reactions. When used together, the array and the locus-specific tagged oligonucleotides 
serve as a "universal chip" system for use in genotyping, wherein by using different sets 
of locus-specific tagged oligonucleotides the system can be tailored to any destred 
.enotvping application. For example, it is an object of the present invention to prov.de 
I method to aid in determining a ratio of alleles at a polymorphic locus. It » another 
10 object of the invention to provide a set of primers for use in determining a rauo of 
nucleotides present at a polymorphic locus. 

Thus, in one embodiment the invention relates to an array comprising one or 
more oligonucleotide tags fixed to a solid substrate, wherein each oligonucleoude tag 
comprises a unique known arbitrary nucleotide sequence of sufficient length to 
15 hybridize to a locus-specific tagged oligonucleotide, wherein the locus-spectfic tagged 
oligonucleotide has at its first end nucleotide sequence which hybridizes to, e.g., « 
complementary to, the arbitrary sequence of the oligonucleotide tag, and wherem the 
locus-specific tagged oligonucleotide has at a second end nucleotide sequence 
complementary to target polynucleotide sequence in a sample. 
20 In one embodiment, the invention relates to a kit comprising an array compnsmg 

one or more oligonucleotide tags fixed to a solid substrate, wherein each 
oligonucleotide tag comprises a unique known arbitrary nucleotide sequence of 
suffictent length to hybridize to a locus-specific tagged oligonucleotide, and one or 
more locus-specific tagged oligonucleotides, wherein each locus-spec.fic tagged 
,5 oligonucleotide has at its first (5') end nucleotide sequence which hybridizes to. e.g., a 
complementarv to. the arbitrary ,equ t „e of a corresponding oligonucleotide tag on the 
array, and has at it's second (3') eno nucleotide sequence complementary to target 
polynucleotide sequence in a sample. 
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The invention further relates to a method of genoiyping a nucleic acid sample at 
one or more loci, composing the steps of obtaining a nucleic acid sample to be tested; 
combining the nucleic acid sample with one or more locus-specific tagged 
oligonucleotides under conditions suable for hybridization of the nucleic acid sample 
5 to one or more locus-specific tagged oligonucleotides, wherein each locus-specific 
tagged oligonucleotide comprises a nucleotide sequence capable of hybridizing to a 
complementary sequence in an oligonucleotide tag and a nucleotide sequence 
complementary to the nucleotide sequence 5' of a nucleotide to be queried in the 
sample, thereby creating an amplification product-locus-specific tagged oligonucleotide 
10 complex; subjecting the complex to a single base extension reaction, wherein the 
reaction results in the addition of a labeled ddNTP to the locus-specific tagged 
oligonucleotide, and wherein each type of ddNTP has a label that can be distinguished 
from the label of the other three types of ddNTPs; contacting the complex with an 
oligonucleotide array comprising one or more oligonucleotide tags fixed to a solid 
15 substrate under suitable hybridization conditions, wherein each oligonucleotide tag 
comprises a unique arbitrary sequence complementary and of sufficient length to 
hybridize to a complementarysequence in a locus-specific tagged oligonucleotide, 
whereby the complex hybridizes to a specific oligonucleotide tag on the array; and 
assaying the array to determine the labeled ddNTPs present in the complex hybridized 
20 to one or more oligonucleotide tags, thereby determining the genotype of the queried 
nucleotide in the sample. In one embodiment the nucleic acid sample to be tested is 
amplified. 

In one embodiment a method is provided to aid in determining a ratio of alleles 
at a polymorphic locus in a sample. A pair of primers is used to amplify a region of a 
25 nucleic acid in a sample. In one embodiment, the region comprises a polymorphic 
locus, and an amplified nucleic acid product is formed which comprises the 
polymorphic locus. The amplified nucleic acid product is used as a template in a single 
base extension reaction with an extension primer, forming a labeled extension pnmer. 
The extension primer (also called a locus-specific tagged oligonucleotide herein) 
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comprises a 3' portion and a 5' portion. The 3' portion is complementary to the 
amplified nucleic acid product and terminates one nucleotide 5' to the polymorph* 
locus The 5' portion is not complementary to the amplified nucleic acid product. A 
labeled dideoxvnucleotide which is complementary to the polymorphic locus is coupled 
5 to the 3' end of the extension primer. Each type of dideoxynucleotide present m the 
reaction bears a distinct label. The 5' portion of the extension primer is hybndtzed to 
one or more probes (also called oligonucleotide tags herein) which are immob.lrzed to 
known locations on a solid support. The probes comprise a nucleotide sequence which 
is complementary to the 5' portion of the extension primer. 
10 Also provided by the present invention is a set of primers for use in determming 

a ratio of nucleotides present at a polymorphic locus. The set includes a pair of 
amplification primers and an extension primer. The pair of primers prime synthesis of a 
region of double stranded nucleic acid which comprises a polymorphic locus. The 
extension primer comprises a 3' portion which is complementary to a portion of the 
15 region of double stranded nucleic acid and a 5' portion which is not complementary to 
the region of double stranded nucleic acid. The extension primer terminates one 
nucleotide 5' to the polymorphic locus. Examples of primers according to the invention 

are shown in Table 1 . 

Another embodiment of the invention provides a method to aid in detemumng a 
20 ratio of alleles at a polymorphic locus in a sample. Any nucleic acid molecule, 

including genomic DNA, which comprises one or more polymorphic locus .s used as a 
template in a single base extension reaction with an extension primer, formmg a labeled 
extension primer. The extension primer comprises a 3' portion and a 5' pomon. The , 
portion is complementary to the nucleic acid molecule and terminates one nucleot.de y 
,5 to the polymorphic locus. The 5' portion is not complementary to the nucle.c ac:d 
molecule A labeled dideoxvnucleotide which is complementary to the polymorphic 
locus is coupled to the 3' end of the extension primer. Each type ofdideoxynucleot.de 
present m the reaction bears a distinct label. The 5' portion of the extension pnmer » 
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hybridized to one or more probes which are immobilized to known locations on a soLid 

support. . 

These and other embodiments of the invention which are described in more 
detail below provide the art with methods and tools for rapidly and easily determining 
5 genotypes of individuals and allele frequencies in populations. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig 1 is a diagram of the universal array. The solid substrate (e.g., a glass slide) 
is depicted on the left and different oligonucleotide tags (« A", "B", »C", etc.) are shown 
attached to the solid substrate. The nucleotide sequence on the right-hand end of each 
10 oligonucleotide tag ("Tag A", Tag B". "Tag C") is arbitrary unique sequence; that is. it 
is designed and synthesized to be unique to each oligonucleotide tag. 

Fig ■> is a diagram depicting a locus-specific tagged oligonucleotide. The 
nucleotide sequence at the left-hand end is complementary to the arbitrary sequence of 
one of the oligonucleotide tags depicted in Fig. 1. The nucleotide sequence at the nght- 
1 5 hand end is complementary to the amplification product of a known polymorphs locus 
(.* a single nucleotide polymorphism (SNP)). Therefore, locus-specific tagged 
oligonucleotide "A" comprises amicleotide sequence complementary to the arbitrary 
sequence of the "Tag A" oligonucleotide tag depicted in Fig. 1. and also comprises 
sequence complementary to SNP "A". 
20 Fig 3 is a diagram showing the hybridization of the locus-specific tagged 

oligonucleotide to the amplification product. The locus-specific sequence (right hand 
end) of the oligonucleotide is designed so that it terminates one nucleotide immedmtely 
before (5* of) the nucleotide to be genotyped (shown in box). 

Fig. 4 is a diagram depicting the labeling of the locus-specific tagged 
25 oligonuclemide-amplification primer complex via single base extension. During the 
region. single labeled ddNTP complementary to the queried nucleotide is 
enzynuticallv added to the 3' end of the locus-specific tagged oligonucleotide. The 
nucleotide is shown in the box. 
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Fig. 5 is a diagram depicting the hybridization of the complex of the 
amplification product and the locus-specific tagged oligonucleotide to the 
oligonucleottde tags on the array. The solid substrate to which the oligonucleotide tags 
of the array are bound is shown on the left, with the individual addresses labeled as "A.", 
5 »B» etc. Each oligonucleotide tag is shown at its address. The locus-specific tagged 
oligonucleotide is shown hybridized to the oligonucleotide tag, and the amplificauon 
product is in turn bound to the locus-specific tagged oligonucleotide. The locus-spectfic 
tagged oligonucleotide is bound to a labeled (■, etc.) nucleotide as a result of smgle 
base extension. Although a single complex is shown at each address, in reality, many 
10 such oligonucleotide tags are located at each address; that is. the substrate surface at 
address "A" has many copies of oligonucleotide tag "A" attached to it. etc. 

Fig. 6 is a diagram depicting th* hybridization as in Fig. 5. but the sample at 
address "B" is heterozygous for the qt«i«l nucleotide. 

Fig. 7 is a schematic showing the combined use of amplification, single base 
15 ' extension of a tagged primer, and hybridization to a tag array. 

Fig 8 shows a quantitative measurement of allele frequency. Template-T 
(5'-TGCTGAATATTCAGATTCTCTAGTGCTACCTGAAAGATCCTG-3'; SEQ ID 

NO: l)and 
Template-G 

70 (5'-TGCTGAATATTCAGATTCTCGAGTGCTACCTGAAAGATCCTG-3'; SEQ ID 
NO: 2) were mixed at different ratios (6 nM /60 nM, 6 nM /18 nM, 6 nM 16 nM, 18 nM 
16 nM, 60 nM 16 nM, 180 nM 16 nM). Six SBE primers 

(5'.CACCATGCTCAC^TGAATGCAGGATCTTTCAGGTAGCACT-3- (SEQ ID 
NO: 3); 

25 5--GATAATTCTCTGATAGGCCGCAGGATCTTTCAGGTAGCACT-3- (SEQ ID 
NO: 4); 

5'-GACTACGATGTGATCCGTGTCAGGATCTTTCAGGTAGCACT-3- (SEQ ID 



NO: 5); 
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5 ? -G AACGC AGTTATC AGACTCTC AGG ATCTTTC AGGTAGC ACT-3 1 (SEQ ID 
NO: 6); 

5*-CGAGGACATGGAGTC AC ATCCAGGATCTTTC AGGTAGC- ACT-3' (SEQ ID 
NO: 7); and 

5 5 ' -GCTAGGCATTCCTCCAGTGTC AGG ATCTTTC AGGTAGC ACT-3 * (SEQ ID 
NO: 8)) were separately added to six SBE reactions which contain the mixed templates 
of different ratios. The SBE primers were extended in the presence of biotin- labeled 
ddATP and fluorescein- labeled ddCTP (see Examples) and pooled and hybridized to the 
tag array. The intensity ratio of the two colors (the y-axis) were plotted against the ratio 
10 of the mixed two templates (the x-axis). 

Fig. 9 shows a clustering analysis of the tag array hybridization results in 44 
individuals at marker GMP- 140.25. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention features a generic or universal genotyping array, consisting of 

15 oligonucleotide tags attached to a solid substrate (Fig. 1). Each address in the array 
(e.g., "A", "B", "C", etc.) has an oligonucleotide tag associated with it. The 
oligonucleotide tag at a given address is attached to the solid substrate, and comprises a 
unique arbitrary nucleotide sequence. That is, the nucleotide sequence is unique for the 
oligonucleotide tag at each address, i.e., the nucleotide sequence for "tag A" is different 

20 from the nucleotide sequence for all other tags in the array. The nucleotide sequence for 
each tag is arbitrary in that it can be any sequence, provided that it is different from the 
nucleotide sequence for every other tag in the array. Preferably the oligonucleotide tag 
is from about 20 to about 50 nucleotides in length. It may also be desirable to design 
the nucleotide sequence of the oligonucleotide tag such that it does not facilitate an 

25 undesirable interaction, e.g., with the target nucleic acid molecule (amplified product). 
The oligonucleotide array is used in conjunction with locus-specific tagged 
oligonucleotides. Each oligonucleotide tag in the array corresponds to a locus-specific 
tagged oligonucleotide. One end (the 5 1 end) of the locus-specific tagged 
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oligonucleond. comprises . nucleotide -I— » » "» ""'"^ 

tb, „ge, nucleic acid molecule the locus-specific -B- oligonucleotide 
onemlud. j.d.-l^-b.^^-^^'-r* 

M8ged ohgonucleotide which hybrids to *. target nucleic molecule - 
„ p ,^ lyft o n »bou 1 15 l o.b. U . 3 0-»dd„.on, Forexarnp,e.,h«>e»dof 
L-^i fi cu SSe d.U g onucl«.»d e ^wou 1 db.o.^«<o.h=^- 
lAi ^« q u«n«»..h.«ndof*..H 8 on«c 1 «o, i d e u s -A-»h i ch, S b.™d,o^s 

complementary to the ,«-*•«» s«,uence * of the — * - * «— " 

15 target "A". , 

To genotype » nucleic acid sample from an indivutal at locus A , 

„. „uc,eic acid modules in the s»ple. Loeus-specific tagged oUgonudeoudes 

comp. ementa* .o th. nucleodde se,ue»ce , .Hocus "A" - 
» ^Uficadonpn.duc.und^c.ndidoassuiuMof.rhyhHdizauonlF.g.W. lb. 

h^oncomplcxiasubi^d.osinglcb^.x.ousio. The fourth of ddNTPs 
in me reacdon mixture have differem labels (,*. four differ*., fluorescent rags, . * 
lhe ddATPs would have an atuched fluorophore ft* fluoresced a. a fir. w.velengd, 
, he OoCTPs would have an attached fiuorophore ma, fluoresced a, a second wavelength. 
,5 .he ddOTP. would ha,e an attached fluomphore that fluoresced a. a mind wavelength. 
andtheddTTPswouldhaveanaUachedfluorophorematfluorescedatafourrh 

wavelengm). During the single base extension reaction, a single ddNTP is attached 
(Fi g 4,, resulting in the formation of a complex composed of the ,oc«,sp=c,fie tagged 
„„.„„„deo,ide extended with the labeled ddNTP and the ampl.fica.ion product. 
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After the single base extension reaction, the complex of the labeled (extended) 
locus-specific tagged oligonucleotide and the amplification product is hybridized to the 
^ (Fi . 5). The oligonucleotide tag "A" at address "A" selectively hybrids to Us 
corresponding locus-specific tagged oligonucleotide (now extended with a labeled 
5 ddNTP). the oligonucleotide tag "B" at address «B« selectively hybridizes to us 
corresponding locus-specific tagged oligonucleotide (now extended with a labeled 
ddNTP) etc. The array is assayed to determine which label(s) is (are)present at whtch 
address on the array. For instance, if address "A" fluoresced at the same wavelength as 
the label on the ddATP, then the amplification product clearly contained a T at the 
l0 queried nucleotide (because the smgle base extension reaction attaches the ddNTP 
complementary to the ouened nucleotide). Fluorescence at a wavelength which » the 
same as the ddCTP label would indicate that the genotype was a "G", etc. Detecuon of 
two pea v . vmhin the wavelength emitted would indicate that different nucleotides were 
present at %. queried position in the sample, e.g., that the individual was heterozygous 
15 at that locus. 

An advantage of the array and method described herein is that many addresses 
can be assayed simultaneously, producing genotyping data for many different genettc 
loci eg SNPs. Byutilizingapredefmedsetoflocus-specifictaggedoligonucleoudes, 

20 a particular purpose, and by utilizing a different set o f locus-specific tagged 

oligonucleotides which correspond to the same tags on the array, the same array can be 
utilized for a different purpose. The universal chip serves as the repository of a set of 
addresses to which the locus-specific tagged oligonucleotides (along with the labeled, 
genotvped SNPs) hybridize in a planned, predetermined manner. The array and set(s) of 
,5 iocus-specific tagged oligonucleotides can therefore be used as components m lots for 
the purposes of sequencing and genotyping. Sets of locus-specific tagged 
oligonucleotides can therefore be used in combination with arrays as descnbed herem 
for use in forensks, .denufication of individuals, and disease diagnosis/prognosts. 
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The present invention provides a convenient and accurate way of determmmg 
the aenotype of an individual at a polymorphic locus or the frequency of alleles in a 
pop^oToneetnbodimentofthemethodinvoWes three steps: (1) arnp UHcatton of a 
polvmorphic locus. (2) primer extension of a sequence-tagged primer wtth dtstmct 
, Lsfordifferentpo^ 

fcetagarrav. Each tag represents a distinct polymorph* locus and each dtstmct label 
represent a distinct allelic form at the polymorphic locus. The method permtts the 

lultaneousdete— 
l0 ofallelefrequenciesinapopulation. Another embodiment employs just steps 2 and , 
Advantages of the disclosed method include that just one generic tag array can 

needed. In addition, the pre-selected probe sequences synthesized on the tag chtp 
guarantee good hybridization results between the probe and the tag. Moreover, the two 

15 colorormultiplecolorapprca^^ 

the allelefrequency in the samples tested. This means very reliable genotype results can 
beobtainednotonlyforindividualsamples,butalsoforpooledsamples. 

A pair of primers or a single primer can be used to amplify a region of a nucletc 
acidinasample. The sample may be from a single individual or may be from a 
20 population of individual, The region which is amplified includes a polymorph, locus. 
Thestepofamplificationisnotspecificforaparticularallele. However.the 
amplification is designed to specifically amplify regions of double stranded or smgle 
stranded nucleic acids which contain polymorphic loci. 

The amplification step may be carried out using any technique known » the art. 

" log orithmically. As is known in the art, each primer ofa pair of amplification pnmers 
hvbridizes to. and i, preferrably complementary to, opposite strands of ,n ..tee. * » 
preferred that the primers hybridize to a double stranded nucleic acid in 
are not more than 2 kb apart, and preferably which are much closer together, such as not 
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more than 1 kb, 0.5 kb. 0.2 kb, 0.1 kb, 0.01 kb or 0.001 kb apart. A suitable DNA 
polymerase can be used as is known in the art. Thermostable polymerases are 
particularly convenient for thermal cycling of rounds of primer hybridization, 
polymerization, and melting. Amplification of single stranded nucleic acids can also 
5 be employed. 

After the amplification it is desirable to remove and/or degrade any excess 
primers and nucleotides. This can be done by washing and/or enzymatic degradation, 
using such enzvmes as endonuclease I and alkaline phosphatase, for example. Other 
techniques, such as chromatography, magnetic beads, and avidin- or streptavidin- 
10 conjugated beads, as are known in the an for accomplishing the removal can also be 
used. It is not necessary to remove or destroy one of wo strands of an amplified DNA 
product. 

The primer extension step of the method is the one which provides allele- 
. specificity to the method. The primer is designed to terminate one nucleotide 5" to the 
15 polymorphic locus. The primer is hybridized to the denatured amplified double 

stranded DNA. When the primer is extended by a single base using dideoxynucleotides 
and a DNA polymerase, the dideoxynucleotide which is complementary to the 
nucleotide at the polymorphic locus is added. Again, any DNA-dependent DNA 
polymerase can be used. These include, but are not limited to, E. coli DNA polymerase 
20 I, Klenow fragment of polymerase I, T4 DNA polymerase, T7 DNA polymerase, T. 
aquatic* DNA polymerase. This reaction is preferably performed at the T M of the 
primer with the template to enhance product formation. 

One configuration for carrying out the primer extension step utilizes two 
different primers which each hybridize to opposite strands of an amplified double 
?S stranded DNA. Each pnmer terminates one nucleotide 5' to the polymorphic locus. The 
primer extension reaction may be more robust with one strand as a template than the 
other. In addition, the information obtained from the second strand should confirm the 
information obtained from the first strand. 
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An alternative method for primer extension involves use of reverse transcriptase 
and one or two primers which hybridize 3' to the polymorphic locus. This method may 
be desirable in cases where "forward" direction primer extension is less robust than is 
desirable. 

5 Each different dideoxynucleotide present in the single base extension reaction is 

uniquely labeled. The unique label can be detected and its amount will be proportional 
to the amount of the particular allele containing the corresponding deoxynucleotide in 
the sample. If the sample is from a single individual, the nucleotide bases present at the 
polymorphic locus can be detemined. If the sample is from a population of individuals 
1 0 the allele frequency in the population can be determined. 

The ability to perform the method of the present invention in a multiplex manner 
for a number of different polymorphic loci simultaneously is due to the sequence tags 
which are present on the extension primers at their 5' ends. The sequence tags permit 
the method operator to ultimately sort the products of multiplex amplification and 
15 multiplex primer base extension to different locations on an array. Each sequence tag 
on an extension primer is used only for a single polymorphic locus. Thus the products 
of primer extension reactions can be separately analyzed because they can be hybridized 
to distinct known locations on an array. 

The sequence tags are typically totally unrelated to the sequences of the 
20 polymorphic alleles which are being analyzed. The sequence tags are chosen for their 
favorable hybridization characteristics. The tags are typically selected so that they have 
similar hybridization characteristics and minimal cross-hybridization to other tag 
sequences. Each sequence tag is attached to a specific gene or genetic marker, and then 
serves as a label for that particular gene or genetic marker. A generic tag array, 
25 corresponding to the pre-selected tag sequences is fabricated and used to detect the 
presence or absence or ratio of specific allelic forms in a test sample. See application 
Serial No. 08/626,285 filed April 4, 1996. and EP application no. 97302313.8 which 
are expressly incorporated by reference herein. 
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The labels which are used can be any which are known in the art. These include 
radiolabeis, fluorescent labels, enzyme labels, epitope labels, and high affinity binding 
partner labels. Examples include isotopically labeled nucleotides, fluorescein- labeled 
nucleotides, biotin-labeled nucleotides, digoxin labeled nucleotides. A different label is 
5 assigned to each base dideoxynucleotide in the single base extension reaction. Two, 
three, or four different labels can be used in the reaction. The different labels can be all 
of the same type, e.g., enzyme labels, or they can be mixed types. 

Hybridization of the 5' portion of the extension primers (the tag sequences) to 
one or more probes which are immobilized to known locations on a solid support is also 

10 contemplated. Hybridization can be performed under standard conditions known in the 
art for obtaining robust signals at high specificity. Standard washing conditions can 
also be employed. Detection of hybridization of the extension primers can be done 
using standard means, depending on the type of labels used. For example, fluorescence 
can be detected and quantified using optical detection means. Radiolabeis can be 

15 detected using autoradiography or scintillation counting. Enzyme labels can be detected 
using enzymatic reactions and assaying for the final product of the enzyme reaction. 
Antigenic labels can be used using immunological detection means. Affinity binding 
partners such as strepavidin or avidin and biotin can also be used as a label. 

The reactions of the present invention can be performed in a single or multiplex 

20 format. For example, the amplification step can be performed using up to 20, 30, 40, 
50, 75, 100, 1 50, 200, 250, or 300 different primer pairs to amplify a corresponding 
number of polymorphic markers. These can be pooled for the single base extension 
reaction, if desired. Pooling for the hybridization step is desirable so that thousands of 
hybridizations can be done simultaneously. 

25 In an alternative embodiment the amplification step can be omitted. Thus, if 

sufficient DNA is available, the single base extension reaction can be performed directly 
on genomic DNA. In another particular cMbod^enu amplification of the entire 
genome can be performed using random primes. 
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Seis of primers according to the present invention comprise an amplification pair 
and an extension primer. These are used together in a method for determining a rauo of 
nucleotides present at a polymorphic locus. These may be packaged in a smgle 
container, preferably a divided container or package. The pair of primers amplify a 
5 region of double stranded DNA which comprises a polymorphic locus. The extenston 
primer has two portions, a 3' portion which is complementary to a portion of the region 
of double stranded DNA which contains the polymorphic locus and a 5' portion which ,s 
not complementary to the region of double stranded DNA. The 5' region is the tag 
sequence which is complementary to the tag array which is used to sort and analyze the 
10 products of the single base extension reaction. The y end of the single base extension 
primer terminates one nucleotide 5' to the polymorphic locus. 

Kits according to the present invention may contain one or more sets of pr !=*s 
as described above. The kit may also contain a solid support comprising at 1c st one 
probe which is attached to the solid support. The one or more probes are 
15 complementary to the 5' portion of the extension primer, to the tag sequences. 
Solid supports, according to the present invention include beads, microtiter plates, and. 



arrays. 



Hybridizing Nucleic Acids to Arrays of Allele-Specific Probes 

"Hybridization" refers to the formation of a bimolecular complex of two 
•>0 different nucleic acids through complementary base pairing. Complementary base 
pairing occurs throush non-covalent bonding, usually hydrogen bonding, of bases that 
specifically recognize other bases, as in the bonding of complementary bases in double- 
stranded DNA. In this invennon. hybridization is carried out between a target nucleic 
acid, which is prepared from the nucleic acid sample by alkie-specific amplificauon, 
.5 and at least two probes which have been immobilized on a substrate to form an array. 

One of skill in the art will appreciate that an enormous number of array designs 
are suitable for the practice of this invention. An array will typically include a number 
of probes that specifically hybridize to the sequences of interest (tags). In addmon, it » 
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preferred that the array include one or more control probes. In one embodunent, the 
arrav is a high density array. A high density array is an array used to hybridize with a 
« nuc.eic acid sample to detect the presence of a large number of allelic markers, 
preferably more than 10, more preferably more than 100. and most preferably more than 

5 1000 allelic markers. 

High density arrays are suitable for quantifying small variations in the frequency 
of an allelic marker in the presence of a large population of heterogeneous nucleic acids. 
Such high density arrays can be fabricated either by de novo synthesis on a substrate or 
by spotting or transporting nucleic acid sequences onto specific locations of a substrate. 
l0 Both of these methods produce nucleic acids which are immobilized on the array at 
particular locations. Nucleic acids can be purified and/or isolated from biologtcal 
materials, such as a bacterial plasmid containing a cloned segment of a sequence of 
interest. Suitable nucleic acids can also be produced by amplification of templates or 
by synthesis. As a nonlimiting illustration, polymerase chain reaction and/or M vuro 
15 transcription, are suitable nucleic acid amplification methods. 

The term "target nucleic acid" refers to a nucleic acid (either synthetic or denved 
from a biological sample or nucleic acid sample), to which the probe is designed to 
specifically hybridize. In this invention, such target nucleic acids are the same as the 
sequence tags. It is either the presence or absence of the target nucleic acid that is to be 
70 detected, or the amount of the target nucleic acid that is to be quantified. The target 
nucleic acid has a sequence that is complementary to the nucleic acid sequence of the 
corresponding probe directed to the target. The term "target nucleic acid" can reter to 
the specific subsequence of a larger nucleic acid to which the probe is directed or to the 
overall sequence (e.g.. gene or mRNA) whose presence it is desired to detect. The 
1 5 difference in usage will be apparent from context. 

As used herein a "probe" is defined as a nucleic acid, capable of binding to a 
target nucleic acid of complementary sequence through one or more types of chemical 
bonds, usually through com pl ementary base pairing, usually through hydrogen bond 
formation. As used herein, a probe can include natural (i.e. A, G, U. C or T) or 
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modified bases (e.g., 7-deazaguanosine, inosine, etc.). A probe can also include an 
oligonucleotide. An oligonucleotide is a single-stranded nucleic acid of 2 to n bases, 
where n can be any integer less than 1000. Nucleic acids can be cloned or synthesized 
using any technique known in the art. They can also include non-natually occurring 
5 nucleotide analogs, such as those which are modified to improve hybridization, and 
peptide nucleic acids. In addition, the bases in probes may be joined by a linkage other 
than a phosphodiester bond, so long as it does not interfere with hybridization. Thus, 
probes may be peptide nucleic acids in which the constituent bases are joined by peptide 
bonds rather than phosphodiester linkages. 

10 Probe Design 

An array includes "test probes", also termed "oligonucleotide tags" herein. Test 
probes can be oligonucleotides that range from about 5 to about 45 or 5 to about 500 
nucleotides, more preferably from about 10 to about 40 nucleotides and most preferably 
from about 1 5 to about 40 nucleotides in length. In other particularly preferred 
15 embodiments the probes are 20 to 25 nucleotides in length. In another embodiment, 
test probes are double or single stranded DNA sequences. DNA sequences can be 
isolated or cloned from natural sources or amplified from natural sources using natural 
nucleic acids as templates. However, in situ synthesis of probes on the arrays is 
preferred. The probes have sequences complementary to particular subsequences of the 
20 genes whose allelic markers they are designed to detect. Thus, the test probes are 

capable of specifically hybridizing to the target nucleic acid they are designed to detect. 

The term "perfect match probe" refers to a probe which has a sequence designed 
to be perfectly complementary to a particular target sequence. The probe is typically 
perfectly complementary to a portion (subsequence) of the target sequence. The perfect 
25 match probe can be a "test probe," a "normalization control probe,-' an expression level 
control probe and the like. A perfect match control or perfect match probe is. however, 
distinguished from a "mismatch control" or "mismatch probe" or "mismatch control 
probe." 
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In addition to test probes that bind the target nucleic acid(s) of interest, the high 
density array can contain a number of control probes. The control probes fall into two 
categories: normalization controls and mismatch controls. 

Normalization controls are oligonucleotide or other nucleic acid probes that are 
5 complementary to labeled reference oligonucleotides or other nucleic acid sequences 
that are added to the nucleic acid sample. The signals obtained from the normalization 
controls after hybridization provide a control for variations in hybridization conditions, 
label intensity, "reading" efficiency, and other factors that may cause the signal of a 
perfect hybridization to vary between arrays. In a preferred embodiment, signals (e.g., 
10 fluorescence intensity) read from all other probes in the array are divided by the signal 
(e.g., fluorescence intensity) from the control probes, thereby normalizing the 
measurements. 

Virtually any probe can serve as a normalize' on control. However, it is 
recognized that hybridization efficiency varies with I *s<. composition and probe length. 

1 5 Preferred normalization probes are selected to reflect the average length of the other 
probes present in the array; however, they can be selected to cover a range of lengths. 
The normalization control(s) can also be selected to reflect the (average) base 
composition of the other probes in the array; however in a preferred embodiment, only 
one or a few normalization probes are used and they are selected such that they 

20 hybridize well (i.e. no secondary structure) and do not match any target-specific probes. 

Mismatch controls can also be provided for the probes to the target alleles or for 
normalization controls. The terms "mismatch control" or "mismatch probe" or 
"mismatch control probe** refer to a probe whose sequence is deliberately selected not to 
be perfectly complementary to a particular target sequence. Mismatch contruls are 

25 oligonucleotide probes or other nucleic acid probes identical to their corresponding test 
or control probes except for the presence of one or more mismatched bases. A 
mismatched base is a base selected so that it is not complementary to the corresponding 
base in the target sequence to which the probe would otherwise specifically hybridize. 
One or more mismatches are selected such that under appropriate hybridization 
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conditions (e.g., stringent conditions) the test or control probe would be expected to 
hvbndize with its target sequence, but the mismatch probe would not hybridize (or 
would hybridize to a significantly lesser extent). Preferred mismatch probes contain a 
central mismatch. Thus, for example, where a probe is a 20 mer, a correspondmg 
5 mismatch probe will have the identical sequence except for a single base mismatch 
(e.g., substituting a G, a C, or a T for an A) at any of positions 6 through 14 (the central 
mismatch). 

For each mismatch control in a high-density array there typically exists a 
corresponding perfect match probe that is perfectly complementary to the same 
10 particular target sequence. The mismatch may comprise one or more bases. Whilethe 
mismatch(s) may be located anywhere in the mismatch probe, terminal mismatches are 
less desirable, as a terminal mismatch is less likely to prevent hybridization of the target 
sequence. In a particularly preferred embodiment, the mismatch is located at or near the 
center of the probe such that the mismatch is most likely to destabilize the duplex with 
1 5 the target sequence under the test hybridization conditions. 

Mismatch probes provide a control for non-specific binding or cross- 
hybridization to a nucleic acid in the sample other than the target to which the probe is 
directed. Mismatch probes thus indicate whether or not a hybridization is specific. For 
example, if the target is present, the perfect match probes should be consistently 
70 brighter than the mismatch probes. The difference in intensity between the perfect 

match and the mismatch probe (WI (MM1 ) provides a good measure of the concentration 

of the hybridized material. 

The array can also include sample preparation/amplification control probes. 
These are probes that are complementary to subsequences of control genes selected 
25 because they do not normally occur in the nucleic acids of the particular biological 
sample being assayed. Suitable sample preparation/amplification control probes 
include, for example, probes to bacterial genes (e.g.. Bio B) where the sample in 
question is from a eukaryote. 



WO 00/58516 



PCT/US00/08069 



-19- 



: „ C ess .f .00.000 o, «. t.ooo.000 dw«™. prohes. « » p~>* » way 

Forming High Density Arrays h.—-™. of allelic 

High density arrays - »» M &r """" ,0img „ 

92/1 05S 8 . U.S. Apph^™, 

Oiigonucleotide »rays have numerous advantage, over other — «* » 
iLcy of pre^no, reduced inna- inter array vaHaoih,,. tocreased 
25 eontent.andhishsignal-10-noiseralio. 

P«ferred high demiry arrays comprise greater than about 1 00. preterab V gr 

Ils.C.OormOOO.reve.greateretanaho.tl.OOO.OOOdifteren.ohgonnCeonde 



WO 00/58516 



PCT/US00/08069 



-20- 

probes, preferably in less than 1 cm 2 of surface area. The oligonucleotide probes range 
from about 5 to about 50 or about 500 nucleotides, more preferably from about 10 to 
about 40 nucleotides, and most preferably from about 15 to about 40 nucleotides in 
length. 

5 Methods of forming high density arrays of oligonucleotides, peptides and other 

polymer sequences with a minimal number of synthetic steps are known. The 
oligonucleotide analogue array can be synthesized on a solid substrate by a variety of 
methods, including, but not limited to, light-directed chemical coupling and 
mechanically directed coupling. See Pirrung et aL 9 U.S. Patent No. 5,143,854 (see also 

10 PCT Application No. WO 90/15070) and Fodor et a/., PCT Publication Nos. WO 

92/10092 and WO 93/09668 and U.S. Ser. No. 07/980,523, which disclose methods of 
forming vast arrays of peptides, oligonucleotides and other molecules using, 
for example, light-directed synthesis techniques. See also, Fodor et aL 9 Science, 251, 
767-77 (1991). These procedures for synthesis of polymer arrays are now referred to as 

1 5 VLSIPS™ procedures. Using the VLSIPS™ approach, one heterogeneous array of 
polymers is converted, through simultaneous coupling at a number of reaction sites, into 
a different heterogeneous array. See, U.S. Application Serial Nos. 07/796,243 and 
07/980,523. 

The development of VLSIPS™ technology as described in the above-noted U.S. 

20 Patent No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and 92/10092, is 
considered pioneering technology in the fields of combinatorial synthesis and screening 
of combinatorial libraries. More recently, patent application Serial No. 08/082,937, 
filed June 25, 1993, describes methods for making arrays of oligonucleotide probes that 
can be used to check or determine a partial or complete sequence of a target nucleic acid 

25 and to detect the presence of a nucleic acid containing a specific oligonucleotide 
sequence. 

In brief, the light-directed combinatorial synthesis of oligonucleotide arrays on a 
glass surface proceeds using automated phosphoramidite chemistry and chip masking 
techniques. In one specific implementation, a glass surface is derivatized with a silane 
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reagent containing a functional group, e.g., a hydroxyl or amine group blocked by a 
photolabile protecting group. Photolysis through a photolithogaphic mask is used 
selectively to expose functional groups which are then ready to react with incoming 
S'-photoprotected nucleoside phosphoramidites. The phosphoramidites react only with 
5 those sites which are illuminated (and thus exposed by removal of the photolabile 
blocking group). Thus, the phosphoramidites only add to those areas selectively 
exposed from the preceding step. These steps are repeated until the desired array of 
sequences have been synthesized on the solid surface. Combinatorial synthesis of 
different oligonucleotide analogues at different locations on the array is determined by 
1 0 the pattern of illumination during synthesis and the order of addition of coupling 
reagents. 

In the event that an oligonucleotide analogue with a polyamide backbone is used 
in the VLSIPS™ procedure, it is generally inappropriate to use phosphoramidite 
chemistry to perform the synthetic steps, since the monomers do not attach to one 

15 another via a phosphate linkage. Instead, peptide synthetic methods are substituted. 
See, e.g.. Pirrung et al U.S. Pat. No. 5,143,854. 

Peptide nucleic acids are commercially available from, e.g., Biosearch, Inc. 
(Bedford, MA) which comprise a polyamide backbone and the bases found in naturally 
occurring nucleosides. Peptide nucleic acids are capable of binding to nucleic acids 

20 with high specificity, and are considered "oligonucleotide analogues" for purposes of 
this disclosure. 

Additional methods which can be used to generate an array of oligonucleotides 
on a single substrate are described in co-pending Applications Ser. No. 07/980,523, 
filed November 20, 1992, and 07/796,243, filed November 22, 1991 and in PCT 
25 Publication No. WO 93/09668. In the methods disclosed in these applications, reagents 
are delivered to the substrate by either (1) flowing within a channel defined on 
predefined regions or (2) "spotting" on predefined regions or (3) through the use of 
photoresist. However, other approaches, as well as combinations of spotting and 
flowing, can be employed. In each instance, certain activated regions of the substrate 
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are mechanically separated from other regions when the monomer solutions are 
delivered to the various reaction sites. 

A typical "flow channel" method applied to the compounds and libraries of the 
present invention can generally be described as follows. Diverse polymer sequences are 
5 svnthesized at selected regions of a substrate or solid support by forming flow channels 
on a surface of the substrate through which appropriate reagents flow or m which 
appropriate reagents are p.aced. For example, assume a monomer "A" is to be bound to 
the substrate in a first group of selected regions. If necessary, all or part of the surface 
of the substrate in all or a part of the selected regions is activated for bmdmg by, for 
10 example, flowing appropriate reagents through all or some of the channels, or by 
washing the entire substrate with appropriate reagents. After placement of a channel 
block on the surface of the substrate, a reagent having the monomer A flows through or 
is placed in all or some of the channel(s). The channels provide fluid contact to the first 
selected regions, thereby binding the monomer A on the substrate directly or mdtrectly 
15 (via a spacer) in the first selected regions. 

Thereafter, a monomer "B" is coupled to second selected regions, some of which 
can be included among the first selected region, The second selected regions will be » 
fluid contact with a second flow channel(s) through translation, rotation, or replacement 
of the channel block on the surface of the substrate; through opening or closmg a 
20 selected valve; or through deposition of a layer of chemical or photoresist. If necessary, 
a step is performed for activating at least the second region, Thereafter, the monomer 
B is flowed through or placed in the second flow channel(s), binding monomer B at the 
second selected locations. In this particular example, the resulting sequences bound to 
the substrate at this stage of processing will be, for example, A, B, and AB. The process 
25 is repeated to form a vast array of sequences of desired length at known locations on the 
substrate. 

After the substrate is activated, monomer A can be flowed through some of the 
channels, monomer B can be flowed through other channel, a monomer C can be 
flowed through still other channels, etc. In this manner, many or all of the taction 
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regions are reacted with a monomer before the channel block must be moved or the 
substrate must be washed and/or reactivated. By making use of many or all of the 
available reaction regions simultaneously, the number of washing and activation steps 

can be minimized. 

One of skill in the art will recognize that there are alternative methods of 
forming channels or otherwise protecting a portion of the surface of the substrate. For 
example, according to some embodiments, a protective coating such as a hydropbhc or 
hydrophobic coating (depending upon the nature of the solvent) is utilized over poruons 
of the substrate to be protected, sometimes in combination whh materials that faahtate 
wetting by the reactant solution in other regions. In this manner, the flowing soluuons 
are further prevented from passing outside of their designated flow paths. 

High density nucleic acid arrays can be fabricated by depositing presynthezied 
or natural nucleic acids in predetermined positions. Synthesized or natural nuclei 
acids are deposited on specific locations of a substrate by light directed targeting and 
15 oligonucleotide directed targeting. Nucleic acids can also be directed to specific 
locations in much the same manner as the flow channel methods. For example, a 
nucleic acid A can be delivered to and coupled with a first group of reactton regions 
which have been appropriately activated. Thereafter, a nucleic acid B can be dehvered 
,o and reacted with a second group of activated reaction regions. Nucleic adds are 
20 deposited in selected regions. Mother embodiment uses a dispenser that moves from 
region to region to deposit nucleic acids in specific spots. Typical dispensers include a 
micropipet or capillary pin to deliver nucleic acid to the substrate and a robotic system 
to control the position of the micropipet with respect to the substrate. In other 
embodunents, the dispenser includes a series of tubes, a manifold, an array of partes 
25 or capHlary pins, or the like so that vanous reagents can be delivered to the reacuon 
regions simultaneously. 



WO 00/58516 



PCT7US00/08069 



-24- 



Hvbridization Conditions 

' The term "stringent conditions" refers to conditions under which a probe wul 
hvbridize to its target subsequence, but with only insubstantial hybndizanon to other 
sequences or to other sequences such that the difference may be idennfied. Stnngent 

5 condUtonsa.sequence^ 

Longersequenceshybridizespecmcallyathigherternperature, Generally, stnn en 

conditions are selected to be about 5'C lower than the thermal melung pomt (T m ) for 
the specific sequence at a defined ionic strength and pH. 

The T is the temperature, under define* ionic strength, P H, and nucletc actd 
0 concentration!", which 50'/. of the probes complementary to the target sequence 
hybndizetothetargetsequenceatequilibrium. As the target sequences are generally 

Lngent conditions will be those in which the salt concentration is at least aboutOOlto 
tOMconcentxationofaNaor other salt at pH 7.0 to 8.3 and the temperature ts at least 

15 about30-Cforshortprobes(,g..l0to 5 0nucle O Udes). Stnngent conditions can also 
be achieved with the addition of destabilizing agents such as formarmde. 

The phrase "hybridizing specifically to" refers to the binding, duplexmg, or 
hybridizingofamolecule substantially to or only to a particular nucleotide sequence or 
sequences under stnngent conditions when that sequence is present in a complex 

20 mixture(,g..totalceUular)ofDNAorRNA.Itisgenerallyreco gn izedthatnucle 1 c 

acids are denaturedby increasing the temperature or decreasing the salt concentrate of 
the buffer containing the nucleic acids. Under low stringency conditions (,g., low 
temperature and/or high salt) hybrid duplexes (e.g., DNA:DNA, RNA:RNA. or 
RNA DNA) will form even where the annealed sequences are not perfectly 
, 5 complements Thus, specificity of hybndization is reduced at lower stringency 
Conversely, at higher stnngency (,g.. higher temperature or lower salt) successful 
hybridization requires fewer mismatches. 

One of skill in the an will appreciate that hybridization conditions can be 
.elected to provde any degree of stringency. In a preferred embodiment, hybridizauon 
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is performed at low stringency, in this case in 6X SSPE-T at 37°C (0.005% Triton X- 
100), to ensure hybridization, and then subsequent washes are performed at higher 
stringency (e.g.. I X SSPE-T at 37°C) to eliminate mismatched hybrid duplexes. 
Successive washes can be performed at increasingly higher stringency (e.g., down to as 
5 low as 0.25 X SSPE-T at 37°C to 50°C) until a desired level of hybridization specificity 
is obtained. Stringency can also be increased by addition of agents such as formamide. 
Hybridization specificity can be evaluated by comparison of hybridization to the test 
probes with hybridization to the various controls that can be present (e.g., expression 
level control, normalization control, mismatch controls, etc.). 
10 In general, there is a tradeoff between hybridization specificity (stringency) and 

signal intensity. Thus, in a preferred embodiment, the wash is performed at the highest 
stringency that produces consistent results and that provides a signal intensity greater 
than approximately 10% of the background intensity. Thus, in a preferred embodiment, 
the hybridized array can be washed at successively higher stringency solutions and read 
15 between each wash. Analysis of the data sets thus produced will reveal a wash 

stringency above which the hybridization pattern is not appreciably altered and which 
provides adequate signal for the particular oligonucleotide probes of interest. 

The stability of duplexes formed between RNAs or DNAs are generally in the 
order of RNA:RNA > RNA:DNA > DNA:DNA, in solution. Long probes have better 
20 duplex stability with a target, but poorer mismatch discrimination than shorter probes 
(mismatch discrimination refers to the measured hybridization signal ratio between a 
perfect match probe and a single base mismatch probe). Shorter probes (e.g., 8-mers) 
discriminate mismatches very well, but the overall duplex stability is low. 

Altering the thermal stability (T m ) of the duplex formed between the target and 
25 the probe using, e.g.. known oligonucleotide analogues allows for optimization of 
duplex stability and mismatch discrimination. One useful aspect of altering the T m 
arises from the fact that adenine-thymine (A-T) duplexes have a lower T m than guanine- 
cytosine (G-C) duplexes, due in pan to the fact that the A-T duplexes have two 
hydrogen bonds per base-pair, while the G-C duplexes have three hydrogen bonds per 
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base pair. In heterogeneous oligonucleotide arrays in which there is a non-uniform 
distnbution of bases, it is not generally possible to optimize hybridization for each 
oligonucleotide probe simultaneously. Thus, in some embodiments, it is desirable to 
selectively destabilize G-C duplexes and/or to mcrease the stability of A-T duplexes. 
5 This can be accomplished, e.g.. by substituting guanine residues in the probes of an 
array which form G-C duplexes with hypoxanthine, or by substituting adenine residues 
in probes which form A-T duplexes with 2,6 diaminopurine or by using tetramethyl 
ammonium chloride (TMAC1) in place of NaCl. 

Altered duplex stability conferred by using oligonucleotide analogue probes can 
10 be ascertained by following, e.g.. fluorescence signal intensity of oligonucleotide 
analogue arrays hybridized with a target oligonucleotide over time. The data allow 
optimization of specific hybridization conditions at, e.g.. room temperature. 

Another wav of verifying altered duplex stability is by following the signal 
intensity generated upon hybridization with time. Previous experiments using DNA 
l5 targets and DNA chips have shown that signal intensity increases with time, and that the 
more stable duplexes generate higher signal intensities faster than less stable duplexes. 
The signals reach a plateau or "saturate" after a certain amount of time due to all of the 
binding sites becoming occupied. These data allow for optimization of hybridan, 
and determination of the best conditions at a specified temperature. 
20 Methods of optimizing hybridization conditions are well known to those of skill 

in the an e.g.. Laboratory Techniques in Biochemistry and Molecular Biology. 
Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., 
(1993)). 



Signal Detection 

The hybridized nucleic acids can be detected by detecting one or more labels 
attached to the target nucleic acids. The labels can be incorporated by any of a number 
of means well known to those of skill in the art. However, in a preferred embodiment 
the label is incorporated by labeling the primers prior to the amplification step ,n the 
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preparation of the target nucleic acids. Thus, for example, polymerase chain reaction 
with labeled primers will provide a labeled amplification product. 

Detectable labels suitable for use in the present invention include any 
composition detectable by spectroscopic, photochemical, biochemical, 
5 immunochemical, electrical, optical, or chemical means. Useful labels in the present 
invention include biotin for staining with labeled streptavidin conjugate, magnetic beads 
{e.g., Dynabeads™), fluorescent dyes (e.g., fluorescein, texas red, rhodamine, green 
fluorescent protein, and the like), radiolabels (e.g., 3 H, 125 1, 35 S, ,4 C, or J2 P), enzymes 
(e.g., horseradish peroxidase, alkaline phosphatase and others commonly used in an 

10 ELISA), and colorimetric labels such as colloidal gold or colored glass or plastic (e.g., 
polystyrene, polypropylene, latex, etc.) beads. Patents teaching the use of such labels 
include U.S. Patent Nos. 3,817,837; 3,850,752; 3 ? 939,350; 3,996,345; 4,277,437; 
4,275,149; and 4,366,241. 

Means of detecting such labels are well known to those of skill in the art. Thus, 

1 5 for example, radiolabels can be detected using photographic film or scintillation 
counters, fluorescent markers can be detected using a photodetector to detect emitted 
light. Enzymatic labels are typically detected by providing the enzyme with a substrate 
and detecting the reaction product produced by the action of the enzyme on the 
substrate, and colorimetric labels are detected by simply visualizing the colored label. 

20 One method uses colloidal gold label that can be detected by measuring scattered light. 

Means of detecting labeled target nucleic acids hybridized to the probes of the 
array are known to those of skill in the art. Thus, for example, where a colorimetric 
label is used, simple visualization of the label is sufficient. Where a radioactive labeled 
probe is used, detection of the radiation (e.g. with photographic film or a solid state 

25 detector) is sufficient. 

Detection of target nucleic acids which are labeled with a fluorescent label (i.e.. 
a "color tag") can be accomplished with fluorescence microscopy. The hybridized array 
can be excited with a light source at the excitation wavelength of the particular 
fluorescent label and the resulting fluorescence at the emission wavelength is detected. 
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The excitation light source can be a laser appropriate for the excitation of the 

fluorescent label. 

The confocal microscope can be automated with a computer-controlled stage to 
automatically scan the entire high density array, to sequentially examine individual 
5 probes or adjacent groups of probes in a systematic manner until all probes have been 
examined. Similarly, the microscope can be equipped with a phototransducer a 
photomultiplier. a solid state array, a CCD camera, etc.) attached to an automated data 
acquisition svstem to automatically record the fluorescence signal produced by 
hybridization to each oligonucleotide probe on the array. Such automated systems are 
tO described at length in U.S. Patent No: 5,143,854, PCT Application 20 92/10092. and 
copending U.S. Application Ser. No. 08/195,889, filed on February 10. 1994. Use of 
laser illumination in conjunction with automated confocal microscopy for signal 
detection permits detection at a resolution of better than about 100 Jim. more preferably 
better than about 50 ^m, and most preferably better than about 25 |im. 

Two different fluorescent labels can be used in order to distinguish two alleles at 
each marker examined. In such a case, the array can be scanned two times. During the 
fust scan, the excitation and emission wavelengths are set as required to detect one of 
the two fluorescent labels. For the second scan, the excitation and emission 
wavelengths are set as required to detect the second fluorescent label. When the results 
from both scans are compared, the genotype identification or allele frequency can be 
determined. 

Quantification and Determination of Genotypes 

The term "quantifying" when used in the context of quantifying hybridization of 
a nucleic acid sequence or subsequence can refer to absolute or to relative 
quantification. Absolute quantification can be accomplished by inclusion of known 
concentration^ of one or more target nucleic acids (e.g.. control nucleic acids such as 
Bio B, or known amounts the target nucleic acids themselves) and referencing the 
hvbridization intensity of unknowns with the known target nucleic acids (e.g.. through 
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generation of a standard curve). Alternatively, relative quantification can be 
accomplished by comparison of hybridization signals between two or more genes, or 
between two or more treatments to quantify the changes in hybridization intensity and, 
by implication, the frequency of an allele. Relative quantification can also be used to 
5 merelvdetectthepresenceorabsenceofanalleleinthetargetnucleicacids. Inone 
embodiment, for example, the presence or absence of the two alleles of a marker can be 
determined by comparing the quantitiesof the first and second color tag at the known 
bcations in the array, i.e., on the solid support, which correspond to the allele-specific 

probes for the two alleles. 
1 0 A preferred quantifying method is to use a confocal microscope and fluorescent 

labels. The GeneChip* system (Affymetrix, Santa Clara, CA) is particularly suitable 
for quantifying the hybridization; however, it will be apparent to those of skill in the art 
that any similar system or other effectively equivalent detection method can also be 
used. 

15 Methods for evaluating the hybridization results vary with the nature of the 

specific probes used, as well as the controls. Simple quantification of the fluorescence 
intensity for each probe can be determined. This can be accomplished simply by 
measuring signal strength at each location (representing a different probe) on the high 
density array (e.g, where the label is a fluorescent label, detection of the florescence 
20 intensity produced by a fixed excitation illumination at each location on the array). 

One of skill in the art, however, will appreciate that hybridization signals will 
vary in strength with efficiency of hybridization, the amount of label on the sample 
nucleic acid and the amount of the particular nucleic acid in the sample. Typically 
nucleic acids present at very low levels (e.g., < I P M) will show a very weak signal. At 
05 some low level of concentration, the signal becomes virtually indistinguishable from 
background. In evaluating the hybridization data, a threshold intensity value can be 
selected below which a signal is counted as being essentially indistinguishable from 
background. 
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The terms "background" or "background signal intensity" refer to hybridization 
signals resulting from non-specific binding, or other interactions, between the labeled 
target nucleic acids and components of the oligonucleotide array {e.g.. the 
oligonucleotide probes, control probes, the array substrate, etc.). Background signals 
5 mavalsobeproducedbyinoinsicnuorescenceofthearraycomponentsthernselves. A 

single background signal can be calculated for the entire array, or a different 
background signal may be calculated for. each target nucleic acid. In a preferred 
embodiment, backeround is calculated as the average hybridization signal intensity for 
the lowest 5% to 10% of the probes in the array, or, where a different background srgnal 
10 is calculated for each target allele, for the lowest 5% to 10% of the probes for each 
allele. However, where the probes to a particular allele hybridize well and thus appear 
to be specifically binding to a target sequence, they should not be used in a background 
signal calculation. Alternatively, background may be calculated as the average 
hybridization signal intensity produced by hybridization to probes that are not 
15 complementary to any sequence found in the sample (e.g., probes directed to nucleic 
acids of the opposite sense or to genes not found in the sample, such as bacterial genes 
where the sample is mammalian nucleic acids). Background can also be calculated as 
the average signal intensity produced by regions of the array that lack any probes at all. 
In a preferred embodiment, background signal is reduced by the use of a detergent (e.g. , 
20 C-TAB) or a blocking reagent (e.g., sperm DNA, cot-1 DNA, etc.) during the 

hybridization to reduce non-specific binding. In a particularly preferred embodiment, 
the hybridization is performed in the presence of about 0.5 mg/ml DNA (e.g., herring 
sperm DNA). The use of blocking agents in hybridization is well known to those of 
skill in the art (see. e.g., Chapter 8 in P. Tijssen, supra). 
25 The high density array can include mismatch controls. In a preferred 

embodiment, there is a mismatch control having a central mismatch for every probe in 
the arrav, except the normalization controls. It is expected that after washing in 
strinsent conditions, where a perfect match would be expected to hybridize to the probe, 
but not to the mismatch, the signal from the mismatch controls should only reflect non- 
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mismatch. Where both the probe in question and its corresponding mismatch control 
show high signals, or the mismatch shows a higher signal than its corresponding test 
probe, there is a problem with the hybridization and the signal from those probes « 
5 {snored For a given marker, the difference in hybridization signal mtensity (I, llelel - 
! „ , 2 ) between an allele-specific probe (perfect match probe) for a first allele and the 

of the presence of or concentration of the first allele. Thus, in a preferred embodtment, 
the signal of the mismatch probe is subtracted from the signal for its correspondtng test 
l0 probe to provide a measure of the signal due to specific binding of the test probe. 

The concentration of aparticular sequence can then be determined by measunng 
the signal intensity of each of the probes that bind specifically to that gene and 
normalizing to the normalization control, Where the signal from the probes is greater 
thanthemismatch.themismatchissubtracted. Where the mismatch intensity » equal 
l5 to or greater than its corresponding test probe, the signal is ignored (Le, the stgnal 

cannot be evaluated). A u 
For each marker analyzed, the genotype can be unambiguously determmed by 
comparing the hybridization patterns obtained for each of the two labels, e.g., color tags 
employed (Fig. 8). If hybridization is indicated for one color tag to its correspond** 
20 allele-specific probe (eg.. "A") but not for the other color tag (e.g.. "G") (pattern at left 
in Fig. 8), then the indicated genotype of a diploid organism would be homozygous 
A/A If hybridization is indicated only for the other color tag to its corresponding 
allele-specific probe (eg., «G") (pattern at center in Fig. 8), then the indicated genotype 
of a diploid organism would be homozygous G/G. If hybridzation is indicated for both 
,5 color tags to their corresponding allele-specific probes (pattern at right m F* 8). then 
the indicated genotype of a diploid organism would be heterozvgoous (A/GV 

Vicinal detection of hybridization, mdicated by an intermediate positive result 
(e g . less than 1%, or from 1-5%. or from 1-10%, or from 2-10%. or from 5-10*. or 
< from 1-20%. or from 2-20%. or from 5-20%. or from 10-20% of the average of all 
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positive hybridization results obtained for the entire array) may indicate either cross- 
hybridization or cross-amplification, depending on the overall hybridization pattern as 
indicated in Fig. 8. However, these can be distinguished by the unique pattern 
observed. Further procedures for data analysis are disclosed in U.S. Application 
5 08/772.376, previously incorporated for all purposes. 

HuSNP and other marker-specific arrays have been designed and used in genetic 
studies 9 * 10 . But the method developed in. this study provides several advantages in 
dealing with many different genetic applications: (1) arrays based on a single generic 
design can be used to genotype different sets of genetic markers because no specific 

1 0 customized genotyping array is needed; (2) the pre-selected probe sequences 
synthesized on the tag array help ensure good hybridization results; (3) accurate 
quantitative measurement of the allele frequency in the tested samples can be achieved. 
Thus, reliable genotype resu* & oin be obtained not only for individual samples, but also 
for pooled samples. Besides i B ^, other assays can be coupled with tag array assay, for 

15 example, oligonucleotide ligation assay (OLA)' 9 * 21 , invasive cleavage of oligonucleotide 
probes assay 22 , allele specific PCR 23 " 24 . 

Our current tag chip contains over 32,000 unique lag probes. For most of the 
genetic application, for example, detecting mutations in one particular gene, it doesn't 
need such high-density chip. Therefore, smaller chips with fewer tags on the chip are 

20 sought after. Alternatively, multiple tags corresponding to one particular marker can be 
designed as to build the redundancy to the assay to assure accurate genotyping. Or 
multiple sets of tags for one set of SNPs can be designed, thus multiple samples can be 
processed and analyzed with one chip. Our current assay uses a two-color labeling 
scheme. But a four-color labeling/scanning system should warrant the assay can be done 

25 in a single tube reaction. 

For broader genetic applications, for example, a study needs to genotype 1 00s to 
1000s genetic markers, amplifying the genetic loci with multiplexing PCR is still the 
best strategy. However, to genotype 1000s to 10,000s markers, pre-amplification of the 
interested genetic loci will be very labor-intensive and costly. A whole-genome 
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approach should be explored, for example, strategies involved using total human 
oenomic DNA directlv, or genomic DNA amplified using some general amplification 
methods, e.g, primer-extension amplification, PEP", or total cDNA. In fact, we have 
tried to use total human genomic DNA directly as the SBE template in our tag array 
5 assay. 24 out of the 38 of the markers that we tested gave good signals (data not shown). 
Nevertheless, large amount of work are warranted as to solve both the sensitivity (signal 
intensity) and specificity (mis-priming) problems before the whole-genome approach 

become really useful. 

The invention will be further illustrated by the following don-limiting examples. 
10 The content of references cited herein is incorporated herein by reference in its entirety. 

EXEMPLIFICATION 

METHODS 

Collection and Isolation of DNA From Samples 

DNA samples were collected by GenNet as part of the ongoing Family Blood 
Pressure Program. Samples were collected with consent and IRB approval in both 
Tecumseh, MI and Loyola. IL FAMILIES. Ascertainment was based on identification 
of a proband in the top 15* (Tecumseh) or 20* (Loyola) percentile of the community's 
blood pressure distribution. Full phenotypic information was obtained for each 
individual. DNA was extracted from 5-10 ml of whole blood taken from each individual 
20 using the standard "salting-out" method (Gentra Systems). 

Primer Design 

For each SNP, primary PCR amplification primers were designed as described 
previously'. The SBE primer was designed in a manner that its 3' terminates one base 
before the polymorphic site. Primer 3.0 software package 
25 (http:/Avww-genome.wi.mit.edu/cgi-bin/primer/primer3.cgi) was modified and used to 
pick SBE primers with batch sequences, at a predicted length of 20 (ranging from 18 to 
26) nucleotide and melting temperature of 60°C (ranging from 54°C to 64°C). The SBE 
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orimers wen, alwavs picked from the forward direcuon first (i.e. 5' to the polymorphic 
pnmers were al ^ ^ „ 

site). If the SBE pnmer can t be picked trom 
tried. 

printtr, . «M deoxynuctamde uipho^s (dNTPs). 10 mM T„,HC, (pH S 3,. 50 
rKa5,nMM 8 C,.and2» nia ofA mP UTa,Go,d(P«rtdn6 ta «r) ln »^va,««of 

«,«I«.M— I--"* *»— * JTC 1.40— C 

mto ,c and 30 ««* «i *« " « 72 ' C ^ *° m ""' K ' 1 

v. r ifr Science 1 U/UD were added to a 25 ^1 P*-*- 
15 Alkaline Phosphatase (Amersham Life Science, lu^l 

and replace the buffer with ddH 2 0. 

?0 Multiplexing SBE Reaction 

SBEiscarriedoutin a 33,lre«^^ 

^ofeachSBEprirner^unitso^^^^^ 

(pH 9.5). 6.5 mM MgCl, 25 \lM of fluorescein-N6-ddNTPs (NEK), 7.5 

u- • v^rTP or3 75 uMbiotin-N6-ddATP.andl0liMthe 
biotin-N6-ddUTP or biotion-N6-dCTP, or J. /:> |iivi oiou 

25 other cold ddNTPs. 
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Extension reaction was carried out on a Thermo Cycler (MJ Research), with I 
cycle of 96°C for 3 minutes, then 45 cycles of 94°C for 20 seconds and 58°C for 1 1 
seconds. 

After SBE reaction, 9 reactions from each sample were combined and mixed 
5 with 30 |il of 100 ^g/ml glycogen (Boehringer Mannheim), 18.75 \il of 8 M LiCl 
(Sigma), and 1 125 |ll of pre-chilled (-20°C) ethanol (Abs.), and precipitated by 
centrifugation at the top speed (Eppendorf centrifuge 5415C) for 15 minutes at room 
temperature; precipitated samples were dried at 40°C for 40 minutes and re-suspended 
in 33 \i\ ddH 2 0. 

10 Tag Array Design and Hybridization 

For each tag sequence, two probes were synthesized on the array. One is exactly 
ihs- designed tag sequence (referred to as a Perfect Match, or PM probe). The other one 
; identical except for a single base difference in a central position (referred to as a 
Mismatch, or MM probe). The mismatch probe services as an internal control for 

15 hybridization specificity. Over 32,000 20-mer tag probes (and their companions) were 
chosen 11 and fabricated on a 8 mm x 8mm size of array. Each probe (feature) occupies a 
30 microns x 30 microns area. The sets of arrays were synthesized together on a single 
glass wafer on which 100 arrays were made. 

The labeled sample was denatured at 95*C - 100*C for 10 minutes and snap 

20 cooled on ice for 2 - 5 minutes. The tag array was pre-hybridized with 6 X SSPE-T (0.9 
M NaCl, 60 mM NaH : P0 4 , 6 mM EDTA (pH 7.4), 0.005% Triton X-100) + 0.5 mg/ml 
of BSA for a few minutes, then hybridized with 120 \il hybridization solution (as shown 
below) at 42°C for 2 hours on a rotisserie, at* 40 RPM. Hybridization Solution consists 
of 3M TMACL (Tetramethylammonium Chloride), 50 mM MES 

25 ((2-[N-Morpholino]ethanesulfonic acid) Sodium Salt) ( pH 6.7), 0.01% of Triton X-100. 
0.1 mg/ml of Herring Sperm DNA, 50 pM of fluorescein- labeled control oligo. 0.5 
mg/ml of BSA (Sigma) and 29.4 |il labeled SBE products (see below) in a total of 120 
\il reaction. 
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The chips were rinsed twice with IX SSPE-T for about 10 seconds at room 
temperature, then washed with IX SSPE-T for 15 - 20 minutes at 40°C on a rotisserie, 
at * 40 RPM. And then wash the chip 10 times with 6X SSPE-T at 22°C on a fluidic 
station (FS400, Affymetrix). The chips were stained at room temperature with 120 \i\ 

5 staining solution (2.2 |-Lg/ml streptavidin R-phycoerythrin (Molecular Probes), and 0.5 
mg/ml acetyiated BSA, in 6 x SSPET) on a rotisserie for 15 minutes, at s 40 RPM. 
After staining, the probe array was washed 10 times again with 6 x SSPET on the 
FS400 at 22 °C. The chips were scanned on a ronfocal scanner (Affymetrix) with a 
resolution of 60-70 pixels per feature, and two filters (530-nm and 560-nm, 

10 respectively). GeneChip Software (Affymetrix) is used to convert the image files into 
digitized files for further data analysis. 

Clustering Analysis 

For a given marker (at a given tag probe position), the intensity of each of the 
two colors (fluorescein and phycoeiythrin) was calculated as the intensity at the perfect 

15 match position (PM) minus that at the mis-match position (MM). Negative fluorescein 
or phycoerythrin intensity values are treated as if they were zero. The Phat values were 
computed as the ratio of the intensities (fluorescein/fluorescein + phycoerythrin). The 
Phat values were sorted, and the optimal set of ranges for AA, AB and BB genotypes 
given the hypothesis of 2 or 3 clusters was considered, subject to the following rules: at 

20 most 4 points (outliers) may be excluded from the genotype ranges. For 2 groups, the 
total range Phat values must be at least 0.3. For 3 groups, the total range Phat values 
must be at least 0.5. Ranges must be separated by a gap of at least 0.1. The width of a 
range may be at most 0.4. A score was then computed as: Score = 1 - (sum of range 
widths / total range) - (outliers * 0.1). 

25 The set of ranges with the best score was found and used to call genotypes. This 

score increases with narrow ranges, while decreases with the n umber of points that are 
left out of any range. Therefore, it tends to be optimal when all the phat values are 
contained within relatively small ranges. 
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a ui Qpniiencine to Determine Genotypes 

^.DN^ 0.3,5 
a nrimers usina standard PCR cycling conditions (2.5 ot -u W 
5 ^dpnmers. us.n « f> Q3 4 2SmM Mg > , 0.15 »1 

al oPO liM pnmer (X2), 1.3 ^ . 
fol dL 0 25 0 10 U|U T* DNA Po.ymtfa* (Sisma), bro.gb, »P . I. u> 

View 1 .0 (Perkin Elmer) software. 

EXAMPLE 1 . 

DNA torn a individual i, isoiaKd. - "PH<" fom ' 5 

15 L^H^TX^MS^O:,^,^ 

Genoa 7-210-245: Dienich. W. o«(..G«n«° l^- 423 - 447 ll99 - H ' . 
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to 1 i-tM forward amplification primer, I |iM reverse amplification primer, 200 \lM 
dGTP, 200 \lM dTTP, 200 |iM dATP, 3.5 mM MgCl,, 1.0 mM Tris-HCl (pH 8.3), 50 
mM KG. 0.02 \iM molecular probe, and 0.25 units of polymerase enzyme. The 
reaction mixture can then be subjected to a two-step amplification process, performed 
5 on a Tetrad (MJ Research, Watertown, Massachusetts), with the conditions: 

denaturation at 94°C for 60 seconds, followed by an annealing/extension step at 53°- 
56°C for one minute. The denaturation and annealing/extension steps are repeated for 
40 cycles. Alternatively, a three-step thermocycling reaction can be used, such as 94°C 
for 60 seconds, followed by annealing at 53°-56°C for 30 seconds, followed by 
10 extension at 72°C for one minute the three steps being repeated for 40 cycles. This may 
be followed by an optional extension step at 72°C for five minutes. 

After amplification is complete, locus-specific tagged oligonucleotides specific 
for the 10 SNPs are added, and are allowed to hybridize to the amplification products. 

Reagents for a single base extension reaction are then added, where each of the 
15 four ddNTPs is labeled with a different fluorophore. Single base extension is then 
performed as described by Kobayashi et al. (Mol. Cell. Probes 9:175-182 (1995)). 

After the reaction is complete, the reaction .products are placed in contact with 
the universal array, and the reaction products allowed to hybridize, each product to its 
appropriate oligonucleotide tag on the array. The chip is then assayed in a fluorometer, 
20 and the wavelength emitted at each address in the array is recorded. From this data, the 
genotype at each individual SNP is determined. 

EXAMPLE 2 

Two alleles of template were mixed at ratios of 1:30, 1:10, 1:3, 1:1,3:1, 10:1, 
and 30: 1 . These were labeled with different color labels by single-base extension 
25 reaction and hybridized to a tag array. A correlation was observed between the signal 
intensity ratio and the template concentration ratio over a 900-fold dynamic range. See 
Figure 2. 
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EXAMPLE 3 

A set of tag sequences is selected such that the tags are likely to have similar 
hybridization characteristics and minimal cross-hybridization to other tag sequences. 
An oligonucleotide array of all of the tags is fabricated. The design and use of such a 
4,000-20mer-tag array for the functional analysis of the yeast genome has been 
described (1). More recently, Affymetrix designed and fabricated an array with a set of 
more than 16,000 such tags. The tag sequence synthesized on the chip can be 20-mer, 
25-mer, or other lengths. 



EXAMPLE 4 

Marker specific primers are used to amplify each genetic marker (e.g. SNP). A 
multiplex PCR strategy is used to amplify these markers from genomic DNAs of tested 
individuals (2). After PCR amplification, excess primers and dNTPs are removed 
enzymatically. These enzymatically treated PCR products then serve as templates in the 
next SBE reaction. Please note that these templates (PCR products) are double 
stranded, which are different from the templates used in other protocols (3, 4). For 
example, in Minisequencing (3) and Genetic Bit Analysis (GBA, 4), a double stranded 
template has to be converted to a single stranded template prior to the base extension 
reaction. The methods used for this conversion are costly, laborious, and hard to 
automate. 



EXAMPLE 5 

In the protocol described below, an SBE primer is designed for each genetic 
marker which terminates 1 base before the polymorphic site. However, other primer 
design schemes can be used. The primer for each marker is tailed with an unique tag 
which is complementary to a specific probe sequence synthesized on the tag chip. The 
extension reaction is multiplex, in wh,ch SBE primers corresponding to multiple 
markers were added in a single reaction tube, and extended in the presence of pairs of 
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ddNTPs labeled with different fluorophores, e.g. for an A/C variant there might be a 
ddATP-red and DDCTP-green. 

EXAMPLE 6 

The resulting mixture is hybridized to the tag array. Each tag corresponds to a 
5 single marker. The ratio of the intensities of the colors indicates the genotype (or the 
allele frequency, ranging from 0% to 100%) of the samples tested. 

EXAMPLE 7 

SBE template preparation: Marker specific primers are used to amplify each 
single nucleotide polymorphism (SNP). A multiplex PCR strategy is used to amplify 
10 these SNPs (Science 280:1077-1082, 1998). 

Multiplex PCR: 

Multiplex PCR reaction is carried out with AmpliTaq Gold and 25 primer pairs 
in a 25|d reaction volume. SNPs with same base composition at the polymorphic site 
(i.e. A/G, T/C, etc) are pooled together. 

15 PCR reagents: 

10XPCR Multiplex Buffer (II): 100 mM Tris/HCl (pH 8.3) 

500 mM KC1 

25 mMdNTPs 
20 F & R Primers (for each primer, the cone, is I \iM ) 
20 ng/pl Genomic DNA 



Multiplex PCR reaction (25 ul) 

Primer Mix ( I \iM each) 2.5 \x\ 
Genomic DNA (20 ng/jil) 2 - 5 ^ l 
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10XPCR Buffer II 
25 mM MgCU 
25 mM dNTPs 
AmpliTaq Gold (5U/ul) 
5 ddH.O 

PCR conditions 

96 # C to min 

40 cycles ; 



94 # C 30 sec 

10 57*C 40 sec 

72'C 1 30 sec 

72*C 10 rain 

4*C O/N 



Enzymatic treatment of PCR products to degrade and de-phosphorylate the unused 
1 5 primers and dNTPs, respectively; 

To a 25 |ai PCR products, add 1 |il of Exonuclease I (Amersham Life Science, 
10 U/ul) and 1 ^1 of Shrimp Alkaline Phosphatase (Amersham Life Science, 1 U/ul), 
and incubate at 37° C for i hour. Inactivate the enzyme activities at 100°C for 15 
minutes. Apply the sample to a S-300 column (Pharmacia), to further reduce the 
20 residual PCR primers and dNTPs, and replace the buffer with ddfoO. The sample is 
ready for next SBE reaction. 



2.5 ul 

5 Hi 
lul 
0.4 ul 
up to 25 ul 
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Sinsle Base Extension (SBE): 

An SBE primer is designed for each SNP which terminates 1 base before the 
polvmorphic site. The pnmer for each SNP is tailed with a unique tag which is 
complementary to a specif, probe sequence on the tag chip. The SBE reacuon i. also 
5 multiplexed at 25-plex. 

Reaction Mixture (33 

Template (see above) 6 ^ l 

SBE Primer mix (20 nM for each primer) 2.5^1 

5X Thermo Sequenase buffer 66 ^ • 

10 Bio-(d)dNTP(Xnmol/|ll*,NEN) °- 5 V 

Flu-ddNTP(lmnol/lll, NEN) 0 8 ^ 
Other two cold-ddNTPs(lamol/lil, Biopharmacia) 0.3 \l\ each 

Thermo Sequenase(6.4 U/Jil) 04 ^ l 
(Amersham) 

15 ddK 2 0 ^ t033 ^ 1 

* X= 0.5 when it is Bio-ddUTP or bio-dCTP(0.5 mM), or X- 0.25 when it is 
bio-ddATP (0.25 mM) 



PCR program: 
96°C 3' 1 cycle 
20 94°C 25" 

58°C 11" 45 cycles 

4°C forever 
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Precipitation: 

*fter SBE reaction, we combined 9 tubes for each sample, mix with 30 pi of 
100 Hg/ml glycogen (Boehringer Mannheim), men precipitated with 18.75 |i 1 of 8 M 
LiCl and 1 125 |U of pre-chilled <-20»C) ethanol (AW. Mix well; then centrifuge at the 
top speed (Eppendorf centrifuge 5415C) for 15 min at room temperature; Decant the 
supernatant, and dry the samples at 40C for 40 min, re-suspend the samples m 33 pi 
ddH20, now it is ready for hybridization. 



Hybridization: 

The prepared sample is denatured at lOO'C for 10 minutes and snap cooled on 
10 ice for 2-5 minutes. The universal tag chip is pre-hybridized with 6 X SSPE-T (0.9 M 
NaCl. 60 mM NaH,P0 4 . 6 mM EDTA (pH 7.4). 0.005% Triton X-100) + 0.5mg/ml of 
BSA. then hybridized with 120 pi hybridizatW* solution (as shown below) at 42°C 2 
hours on a rotisserie, at« 40 RPM. 

The hybridization solution contains: 
15 5MTMACL 72 ^ l 

0.5MMES(pH6.7) l 2 ^ 1 

1% Triton X-100 12 M- 1 

HS DNA (lOmg/ml) 1 2 M- 1 

Flu-c213 (5 nM) 12 ^ 

20 BSA(20mg/ml) 

Plus 29.4 \ll prepared sample (see above). 

Post-Hybridization Wash: 

Rinse the chip with IX SSPE-T 10" twice first, then wash with IX SSPE-T for 
25 15-20min at 40°C on a rotisserie. at - 40 RPM. And then wash on a fluidic station 
(FS400. Affymetrix) 10 times with 6 x SSPET at 22°C. 
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^Itainthechip^^^ 

streptavidin R-phvcoerythrin (Molecular Probes), and 0.5 mg-ml acetylated BSA, » 6 x 
streptavt K p * ^ prQbe ^ was 

SSPET) on a rotissene for 15 minuies, ai 
^hed 10 rimes again wi* 6 x SSPE-T on *. FS40O at JTC. 



10 data analysis. 
EXAMPLE 7 

Genotyping With High-Density Oligonucleotide "Tag" Arrays 

containsover 32,000 pre-selected 20-mer oUgonucleodde probes, co»^ 

■ 15 ^er-specificPCRampUfi^^ 

beendeveloped-wehaveusedtbisme^odtogenotypeacollecuonofm 

smgle-nucleotidepo^^ 

genes' First, marker-specific primers were used in multiplex PGR reactions to amplify 
genes, run, ovip s The PCR amplified DNA products were 

specific genomic regions containing the SNPs. The PLR amp 

! , . . in SRE reactions Each SBE primer compnses a 3' portion and a 
20 then used as templates in SBE reactions, cacn y A „ min - tes 

y ponion. The 3- porta .s »mple« . •»« specie °" d — " 
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The tag array strategy begins with an array of tag sequences selected in a manner 
that all tag probes are in the same length, e.g. 20-nucieotide long, with similar melting 
temperature and G-C content, and the lowest sequence homologous among each other 11 . 
Therefore, these tags are likely to have similar hybridization characteristics and minimal 

5 cross-hybridization to other tag sequences. 

The design and use of a 4,000-tag array for the functional analysis of yeast 
Saccharomyces cerevisiae genes" and drug sensitivity studies 12 have been described. 
More recently, we have designed and fabricated an array that contains more than 32,000 
such tags, and developed it as a genotyping tool, in combination with marker-specific 

1 0 PCR amplifications and SBE reactions. 

As shown in Fig. 7, marker specific primers are designed and used to amplify 
each single nucleotide polymorphism (SNP). A multiplex PCR strategy is used to 
amplify these SNPs from genomic DNAs 9 . In general, SNPs with same base 
composition at the polymorphic site (e.g. ail the A/G polymorphisms) are grouped 

1 5 together. After PCR amplification, excess primers and dNTPs are degraded and 

de-phosphorylated using Exonuclease I and Shrimp Alkaline Phosphatase, respectively. 
These enzymatically treated PCR products (double-stranded) are then served as 
templates in the SBE reaction. A SBE primer is designed for each genetic marker, 
which terminates one base before the polymorphic site. Each primer is tailed with a 

20 unique tag that is complementary to a specific probe sequence synthesized on the tag 
array. The extension reaction is multiplex, in which SBE primers corresponding to 
multiple markers (up to 56 markers that we have tested so far) were added in a single 
reaction tube, and extended in the presence of pairs of ddNTPs labeled with different 
fluorophores. e.g. for an A/G variant, biotin-labeled ddATP and fluorescein-labeled 

25 ddGTP are used. The resulting mixture of SBE reactions is hybridized to the tag array. 
Each tag hybridizes to a specific probe position on the chip. The ratio of the intensities 
of the colors indicates the genotype (homozygous wild type, or homozygous mutant, or 
. heterozygous) or the allele frequency (ranging from 0% to 100%) in the samples tested. 
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multiplexing SBE assay was developed with a complexity of 9 to 28 markers m each 
reaction and a total of 9 reactions for the 165 markers. 21 of them (12.7%) tak* * the 
multiplexing PCR and multiplexing SBE assay. Therefore, 144 markers from 49 genes 
passed the assay development. The gene location, polymorphic sites, and the deseed 
5 primers for these 144 markers were summarized in Table 1. 

We then genotyped 44 individuals using 44 tag arrays. Good hybridizatton 
signals were obtained in 96.5% (6116/ 6336 (144 x 44)) of the cases. The signal 
intensity values from the hybridization results were used in clustering analysis for each 
of the 144 markers. Genotypes for each individual at the 144 loci were asstgned 
automatically based on the clustering results, with some manual editing. Data Desk .0 
(Data description, Inc.) was used to manually display the clustering analysis results (of 
the intensity ratios of the two colors). Overall, 80-85% of the markers form good 

clusters). . . 

We have performed the gel-based DNA sequencing to determine the genotypes 

15 at 115 loci in 3 of the 44 individuals (see Methods). Comparison of the ABI sequencmg 
results and the chip results resulted in 14 discrepancies (4%), out of 1 1 5 x 3 - 345 
genotype calls. Most of the discrepancies occurred in cases where one method called 
homozygous, while the other method called heterozygous. In one case (marker 
ICAMlex6.254), where the ABI sequencing method called G/G, but the tag array /SBE 
20 assay method called A/A in all the three individuals, we believe the discrepances are 
due to mis-priming of the SBE primer to adjacent sequences. 

We also tested the reproducibility of the tag array/SBE assay genotyping 
method. We repeated the multiplexing PCR, SBE and the chip hybridization 
experiments in 4 individuals. The ratios of the two colors (for each of the 144 markers) 
75 in the replicated experiments are not all exactly the same, but they all fall into the same 
cluster ,i.e. giving the same genotype call). Therefore, we didn't find any discrepancy u 
the genotyping call of duplicated samples. 
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SBE Primer 


AGAGTCTATAAGCAT 
CGTCGGGCGACGAAG 
CTTCCGAGGAA 


TCAGACAATTCTATA 

CGCGGTGGAGAGGAA 

GCAGAAGGGCT 


TCGTGAGTTGTCCTGC 

TGCAGCACCCTCTGC 

TGGTCCC 


GCCTGTAATGGTGGA 
TCTCAGTCCCCAGCC 
AGGAGGCA 


GATCTGTCTGACGCT 
GTATGGCAGCCAGGC 
AACAACCAGC 


CGTGATAATGCGTCT 
CGTAGCAGGACCTAG 
AACGGGCAGC 


CATTATCGGACATGC 
TCACTTGGAGCTCAA 
GCCATTCAA 


ATGATGAGCCGTGAT 
GACCCCTGACGAATG 
TGATGGCCAC 


TACATCGCTTGCATG 
AGTGTGAGCTGCAGC 
CACTCTACCT 


Reverse Primer 


GGGACTGCTTCCATTCTGC 


GACCACAAGCACTCACCTTC 


< 
O 

o 

I 

e 

< 


TGACTGTCACCTGTTGGGA 


GTGGGTGGTTGTCTGGC 


TCCTGGGCAGGCAGC 


CGTCAGATCTGGTAGGGGG 


GGTCTTCATATTTCCGGGAT 


CGTAGGCATGCAGGTTG 












AGCCAGGCAACAACCAG 
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Forward Primer 


GACGAAGCTTCCGAGGA 


GAGAGGAAGCAGAAGG< 


GCACCCTCTGCTGGTCC 


GCACCCTCTGCTGGTCC 


AGGACCTAGAACGGGO 




TGGAGCTCAAGCCATTC 


GACGAATGTGATGGCCy 


GAGCTGCAGCCACTCT/ 


SNP Flanking Sequence 


O 

I 

o 

3 

O 

o 

< 

o 
< 
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O 
O 

< 
< 

8 
y 
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i 

s 

1 

< 

1 


CCAGGAGGCA(T/C)CCCAACAGGT 


AACAACCAGC(A/G)GCCAGACAAC 


AACGGGCAGC(G/A)CTGCCTGCCC 


O 

i 

< 


TGATGGCCAC(A/G)TCCCGGAAAT 


§ 
I 

U 

e 

< 

u 


fiene/Exon/Position 1 


AADDEX 10.246 


AADDEX13.173 


ACEEX13.138 


ACEEX13.151 


ACEEX 13.202 


ACEEX15.144 


ACEEX17.19 


ACEEX17.52 


ACEEX18.130 
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CCATCGAATCXjTCTA 

TCAGTACTTTGTGCTG 

ACGGCTGCTC 


GGTCTCAATTAGGCT 

TCATGTACTCCGCTCT 

GGCCACCTTG 


GCCGGTCATGTGCTC 
TGATATCACCACACT 
GGCCAAGGACA 


GCGTGATATTCCATG 
ATCTGAGGTTCTGGA 
ATCCCGGAG 


GCTGGTGATGGCTCT 
TCATATGGAGGACTA 
CCTGGATATCAAG 


CGAACATCTGTCACA 
ATGCGCTCGGTTAGC 
GACCAATTGTCA 


GACTCTAGTGTCGTCT 

GATCTCTTTGGTGAA 

ACCATGGTAGAAG 


TCAGATGTTGTAATC 
GTGCGCAAGGTGTGG 
AAGGAGCACTT 


GCGTCGGCTTCATGC 
GATATTACACCAGCA 
TCGTGGCGGAG 


ATGCACGATCCTCTA 

CATTGGGACTTCTCCC 

AGGCCCTGA 


TGTCTCACCTTCCTGCACA 


AGGTGATGGAGGCGAGA 


CCITGACCACCTCCTCCA 


CCGCCAACATGGTCTTC 


i ■ 
§ 


GACGCTCACTGCAAGTCG 


TCAAGGAGAATGGTGCTCC 


ATGCAGTCCCAGGCCT 


AGTTCCGCATTCAACAGG 


! 

< 

o 
u 


TGTGCTGACGGCTGCT 


CCGCTCTGGCCACCTT 


i 

i 
g 

I 


GGTTCTGGAATCCCGGA 


TGGAGGACTACCTGGATAT 
CAA 


CGGTTAGCGACCAATTGTC 


TTTGGTGAAACCATGGTAG 
AA 


AAGGTGTGGAAGGAGCACT 


CAGTACACCAGCATCGTGG 


GCAGTGGCCAGGGACT 


1 
1 

§ 

< 
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o 

< 
u 
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s 
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g 

1 

o 


AATCCCGGAG(G/C)TGAAGACCAT 


U 
O 

o 
o 

! 

I 


CCAATTGTCA(T/G)ACGACTTGCA 


ATGGTAGAAG(T/C)TGGAGCACCA : 


8 

O 
O 
< 

< 
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< 
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CAGGCCCTGA(A/G)GAAGAAGGTG 


> 
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CLCNKBEX 10.33 


CLCNKBEX 15.64 


CLCNKBEX4.19 


CLCNKBEX4.70 


COX2EX 1.358 


COX2EX10.156 | 


CYP11B1EX4.205 


CYP11BIEX5.107 


CYPUB2EX3.152 
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CTTACCCATGATTAG 

CGCAGGGAACCCCGA 

CGTGCAGCAGA 


GCCGATGGTGCGTCT 
ACTATGTCTGTTTrTG 
GAGCGAGTGG 


TGGCAGGTTGTGACT 
CTCTCAACCGGAGTT 
GCCCTCAGAC 


TATGATTATTGAGTG 
CGGCCCTGCGACTCC 
AAGATGAAACCC 


TCAGATCGTCTTGCTG 
TCGAACCCAGAGGAA 
GCCGGCCTT 


TTTGAGATTTGTCGA 
GAGCCACTGACCCCT 
ATTCCCTGCTT 


GCCTGCTGTGGCTGT 
ATATCAGATAACTTC 
CTTTGTAGTCCATC 


GATCACTGTGGTCCC 
TGTCTGTAGCTGGAC 
TTTCTGC 


TATGAGTGTTGCGCT 
ATGCCTCATCTGGGA 
ATTGGGACAAC 


GCGTCGCTGTCGTGT 
ACTATCCACAGGGGA 
GTGGGACAAC 


GCTCTCCTGGCGCAG 


TGAAGCACCAAGTCTGAGC 
T 


GGACCTCCATGGTGCAC 


GGCAGTAGTTGAGGCGG 


CCTGGACCCCCGAAG 


CTCTGACACCCCTCAAGTTC 


GCAGATAACTTCCTTTGTAG 
TCCA 


GTCAGGAGGGAGAGTCCAG 


8 

u 


CCTTCACATGTGGGCTTC 


CCCGACGTGCAGCAG 


GCTCTACCCTGTGGGTCTGT 
CCGGAGTTGCCCTCAGA 


GCGACTCCAAGATGAAACC 


CCAGAGGAAGCCGGC 


TGACCCCTATTCCCTGCT 


CTGAAGCCATAGGTnTGAT 
ATAAT 


< 

i 


CATCTGGGAATTGGGACAA 


CACAGGGGAGTGGGACA 


GTGCAGCAGA(T/C)CCTGCGCCAG 


GAGCGAGTGG(T/C)GAGCTCAGAC 
GCCCTCAGAC(G/A)CGTGCACCAT 


GATGAAACCC(G/A)ACCGCCTCAA 


AGCCGGCCTT(G/T)CCTTCGGGGG 


CTCAGAGGAC(A/G)ACCCCGAGTA 


O 

g 

< 

o 
o 

I 

o 


1 
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TTGGGACAAC(G/A)AGAAGCCAAC 
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CYP11B2EX7.65 
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DBHEX4.132 


DBHEX5.39 
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EDNRBEX3.144 


ELAM1.77 
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ELAM1EX7.200 
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ATACGGGATGATGAG 
CATACTGCTGCAGGC 
CCCAGATGA 


TACATGACTTGCCCT 

GCTGTTTCATGATCCC 

AAGCTGAAAGGCAA 


ACGATGAGCAGGGAT 
CACTAACAGGTGCAG 
CACGCAGCC 


ATCTGAGAGCTAGTC 

GGCATCCACCCTCTCT 

CAGAAGGTC 


GGTGACTATTCGGCT 
GCTCTACCAGCAATG 
ACAACATGGGCT 


TAGCTGTGTTGACAT 
CTGGCACAGAAACAC 
CACAGCACTAATT 


TGCTTAGTTGTGAGT 

CGCCAGAGCAGAGTG 

CAGTGTGCCT 


CTCACGACTGGGCTG 

ATGATTCCATCCCTCC 

AGGCACCCTCA 


TGGCACAGTTTCCTG 
CTGGTGGCTCCACCT 
GTCATTTCTCTTGT 


GCTGGGTGTGATCCT 
CTCTACAAGAGAATG 
GCCACTGGTCA 


CAGAAGGAAGAGTTCTGGG 
G 


CACATAACGCTCTCTGGAGG 


< 
O 

8 

I 


CACCGTCTTTGCGCC 


CCGCAGGATCCACCA 


ACTGCACTCTGCTCCACAG 


GCTGTGCTGTGGAGCATG 


AGAGGGCCCAGAGGGT 


CCCACCCATTATCAGACCTA 


g 

8 

< 

g 

O 
O 

6 


TGCAGGCCCCAGATG 


TCCCAAGCTGAAAGGCA 


CAGGTGCAGCACGCA 


CCCACCCTCTCTCAGAAGGT 


AGCAATGACAACATGGGC 


CTAAACAGAAACACCACAG 
CAC 


GCAGAGTGCAGTGTGCC 


CCCTCCAGGCACCCTC 


TGGAGCGGTGGCTTCTA 


AAGAGAATGGCCACTGGTC 


CCCCAGATGA(T/G)CCCCCAGAAC 


TGAAAGGCAA(G/T)CCCTCCAGAG 


GCACGCAGCC(G/C)CTCCGGGAGC 


TCAGAAGGTC(G/C)CGGCGCAAAG 


AACATGGGCT(T/G)CTGGTGGATC 


8 
< 
0 

1 

< 
0 
0 
< 


CAGTGTGCCT(T/C)CCATGCTCCA 


I 

§ 

O 
O 


TTTCTCTTGT(A/G)ACAATGGCTT 


CCACTGGTCA(A/C)CTACCGTGCC 












r- 
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*t 


eNOS.78 


ETIEX5.90 


GALNREXl 


GALNREXl 


GGREX9.2S 


GLUT2EX1 


GLUT2EXI 


GLUT4EX3 


0 
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GMP- 140. 1< 
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GCATGAAGTTCCATA 
ATCGCGAGCCTCCAA 
TGGCATCA 


CAGTGACATGCCGCT 

CAGTACATCTTCTCCA 

TCCTTGGTTACATG 


CGGCAATATGATGAT 
AGGTCCCCATGAACA 
CAAGGTCA 


CCTGGTATGACATGG 
AGCCTCAGCATCAAC 
TGTATCACCAGCTTC 


CCAACGATGCTACTG 

AGTCACGCCCTGTTCT 

GCATAACCA 


CATTGCACCCACTGA 
GATGGATTGTCACCA 
GGATCAATGAC 


CACGGATCTGCCGCT 
AGAATCATCTGGTGG 
AGTAATTTTCC 


CGAACACATGCGGCT 
GGATAAGCTGCGGGG 
AGCAGGG 


AGATAGAGTCGATGC 
CAGCTTTGCAGTGGC 
CGCCGCCG 


TGCCTCATTGTGACTC 
ATGGACAGCTGTAAA 
TTTCTGCTGGACAA 


GTCGATGTGCAGGTAGGC 


TGTTGACCTTGTGTTCATGG 


TGTGGCCACATCCTCAAT 


AGATGGCGAACCCAGAG 


GCCCAGCCCCTACTCAC 


ACTCTCCTTACCGTGTGTGA 
A 


GCTGAACTGACATTAGAGG 
TGA 


GGGCGCTCTGGGAGA 


AGGGCTGATGCCGCT 


ATGAATAGGTGTGGGTGTA 
CG 


GAGCCTCCAATGGCATC 


O 

I 

< 

J 


GCCCATGAACACAAGGTC 


GCATCAACTGTATCACCAGC 
TT 


CGCCCTGTTCTGCATAACC 


AATTGTCACCAGGATCAAT 
GA 


TCACATCTGGTGGAGTAATT 
TTC 


GCTGCGGGGAGCAGG 


CTTGCAGTGGCCGCC 


< 

1 

d 

< 


AATGGCATCA(A/C)TGCCTACCTG 


TGGTTACATG(G/C)CCCATGAACA 


CACAAGGTCA(A/G)CATTGAGGAT 


% 

O 

'P 

J 

o 

3 
o 
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TGCATAACCA(A/G)GGTGAGTAGG 


O 

d 
d 
5 

< 
< 
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GCCGCCGCCG(A/C)CAGCGGCATC 
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NETEX5, 


NETEX7. 




NETEX7. 1 




NETEX7.1 


NETEX9.1 




OB. 160 




OB-R.174 
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PLA2AEX; 




PLA2AEXj 
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TGTGAGCTTGTTACT 

ACGGCTGCCTTCTGCT 

TGGAGGCTGTG 


TGTGAATATGTGTGT 
GCCACTGAGGCCTGG 
GGGGCACC 


TGAGACTATTTAGGC 
TGTGCTCCTCCTCATC 
GGGGCCC 


GATCGCAGTTCAGAG 
CGCATATTTTCTTGAC 
CCCTACTTAC 


CAGTCTCGTGOATAG 

CACTCGTTCTCCGCAT 

CCAGAACATTCTAT 


GACTGGGATTACATG 
CTATGGAGGCCCACA 
CCAACTTT 


CACTCCGATGGCGAG 
ATGAATTTGGCTTCC 
AGCCTGACA 


GCACCGTCTGTCGAT 
CTATACAGAGAGAGA 
CCTGCATTG 


AGCCAAGTGCAGGCG 
TACATCCTGGAAACA 
GGCTCCCCCA 


ig 

si 

POP 


AGCTGGCAAGATCTGGG 


CCAGGTACCACGACTCCTC 


CCAGGTACCACGACTCCTC 


CCCAAATACATCTCCCAGGA 


CATAAACTGTAGTCACTGTA 
GGCTTCT 


CCGTGTCAGGCTGGAAG 


GTGTTGGGGCTGCGG 


GCAGGACTCCTTGCACAT 


ACGGGGAGCTTCTGGA 


1 

3 

g 

o 

3 


CCTTCTGCTTGGAGGCTGT 


O 

o 
o 
o 

8 

1 


GCTGAGGCCTGGGGG 


TCACTATnTCTTGACCCCT 
ACTT 


CCGCATCCAGAACATTCTA 


GGAGGCCCACACCAACTT 


TTGGCTTCCAGCCTGAC 


CGCAGAGAGAGACCTGCAT 
T 


GGAAACAGGCTCCCCC 


CAGTATTGAGATGCTTCTGT 
CCA 


$ 

s 
§ 

1 

< 

o 
o 


GGGGGGCACC(T/A)CCTCCTCATC 


ATCGGGGCCC(T/A)GGAGGAGTCG 


O 
< 

o 
o 

I 

% 

1 

g 
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s 
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< 
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o 

I 

< 

3 
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O 

o 
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S 
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ACCTGCATTG(G/T)CATGTGCAAG 


GGCTCCCCCA(T/C)GTCCAGAAGC 


TTCTGTCCAA(C/A)TTCGGTGGCC 




»-« 


Os 
















PNMTEX3.il 


PNMTEX3.2! 


PNMTEX3.2C 


PON1.584 


PON2.949 


SCNN1B.222 


SCNNIB.238 


SCNN1B.AA4- 


SCNN1G.172 


SCNN1G.21 
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While this invention has been particularly shown and described with references to 
preferred embodiments thereof, it will be understood by those skilled in the art that 
various changes in form and details may be made therein without departing from the 
scope of the invention encompassed by the appended claims. 
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What is claimed is: 

i An oligonucleotide array comprising one or more oligonucleotide tags fixed to a 
solid substrate, wherein each oligonucleotide tag comprises a unique known 
5 arbitrary nucleotide sequence of sufficient length to hybridize to a locus-specific 

tagged oligonucleotide, wherein the locus-specific tagged oligonucleotide has at 
its first end nucleotide sequence which hybridizes to, e.g., is complementary to, 
the arbitrary sequence of the oligonucleotide tag. 

2. A kit comprising: 

10 ( a ) an array comprising one or more oligonucleotide tags fixed to a solid 

substrate, wherein each oligonucleotide tag comprises a unique known 
arbitrary nucleotide sequence of sufficient length to hybridize to a locus- 
specific tagged oligonucleotide; and 
(b) one or more locus-specific tagged oligonucleotides, wherein each locus- 

! 5 specific tagged oligonucleotide has at its first (5') end nucleotide 

sequence which hybridizes to, e.g., is complementary to, the arbitrary 
sequence of a corresponding oligonucleotide tag on the array, and has at 
it's second (3 f ) end nucleotide sequence complementary to target 
polynucleotide sequence in a sample. 

20 3 . A method of genotyping a nucleic acid sample at one or more loci, comprising 
the steps of: 

(a) obtaining a nucleic acid sample to be tested; 

(b) combining the nucleic acid sample with one or more locus-specific 
tagged oligonucleotides under conditions suitable for hybridization of the 

25 nucleic acid sample to one or more locus-specific tagged 
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oligonucleotides, wherein each locus-specific tagged oligonucleotide 
comprises a nucleotide sequence capable of hybridizing to a 
complementary sequence in an oligonucleotide tag and a nucleotide 
sequence complementary to the nucleotide sequence 5' of a nucleotide to 
be queried in the sample, thereby creating an amplification product- 
locus-specific tagged oligonucleotide complex; 

(c) subjecting the complex to a single base extension reaction, wherein the 
reaction results in the addition of a labeled ddNTP to the locus-specific 
tagged oligonucleotide, and wherein each type of ddNTP has a label that 
can be distinguished from the label of the other three types of ddNTPs; 

(d) contacting the complex with an oligonucleotide array comprising one or 
more oligonucleotide tags fixed to a solid substrate under suitable 
hybridization conditions, wherein each oligonucleotide tag comprises a 
unique arbitrary sequence complementary and of sufficient length to 
hybridize to a complementarysequence in a locus-specific tagged 
oligonucleotide, whereby the complex hybridizes to a specific 
oligonucleotide tag on the array; and assaying the array to determine the 
labeled ddNTPs present in the complex hybridized to one or more 
oligonucleotide tags, 

thereby determining the genotype of the queried nucleotide in the sample. 

A method to aid in determining a ratio of alleles at a polymorphic locus in a 
sample, comprising the steps of: 

(a) using a pair of primers to amplify a region of a nucleic acid in a sample, 
wherein the region comprises a polymorphic locus, whereby an amplified 
DNA product is formed; 

(b) labeling an exter sion primer by a single base extension reaction to form 
a labeled extension primer, wherein the amplified DNA product is used 
as a template, wherein the extension primer comprises a 3* portion and a 
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5' portion, wherein the 3' portion is complementary to the amplified 
DNA product and terminates one nucleotide 5" to the polymorphic locus, 
wherein the 5' portion is not complementary to the amplified DNA 
product, whereby a labeled dideoxynucleotide which is complementary 
5 to the polymorphic locus is coupled to the 3' end of the extension primer, 

wherein each type of dideoxynucleotide present in the reaction bears a 
distinct label; and 

(c) hybridizing the 5' portion of the extension primer to one or more probes 
complementary to the 5' portion which are immobilized to known 
1 0 locations on a solid support. 

5. The method of claim 4 wherein two complementary strands of tbe amplified 
DNA product are present in the single base extension reactk <l 

6. The method of claim 4 wherein two complementary strands of the amplifed 
DNA product are used as templates in the step of labeling. 

15 7. The method of claim 4 wherein the label is a fluorescent label. 

8. The method of claim 4 wherein the label is a radiolabel. 

9. The method of claim 4 wherein the label is an enzyme label. 

10. The method of claim 4 wherein the label is an antigenic label. 

11. The method of claim 4 wherein the label is an affmity binding partner. 



20 12. 



The method of claim 4 further comprising the step of: 

(d) optically detecting a fluorescent label on the solid support. 
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13. The method of claim 4 wherein the step of labeling employs at least two distinct 
dideoxynucleotides bearing distinct labels. 

14. The method of claim 4 wherein the step of labeling employs four distinct 
dideoxynucleotides bearing distinct labels. 

5 15. The method of claim 4 further comprising the steps of: 

(d) comparing quantities of a first and a second label at a location on the 
solid support; and 

(e) determining the ratio of nucleotides present at the polymorphic locus in 
the sample. 

10 16. The method of claim 15 wherein the ratio of nucleotides present at two or more 
polymorphic loci is determined simultaneously. 

17. The method of claim 4 wherein the sample comprises DNA from two or more 
individuals. 

18. The method of claim 17 wherein the ratio of nucleotides present at two or more 
1 5 polymorphic loci is determined simultaneously. 

19. The method of claim 4 wherein the solid support is selected from the group 
consisting of beads, microtiter plates, and oligonucleotide arrays. 

20. A set of primers for use in determining a ratio of nucleotides present at a 
polymorphic locus, comprising: 

20 (a) a pair of primers which when in the presence of a DNA polymerase 

amplify a region of double stranded DNA. wherein the region comprises 
a polymorphic locus; and 
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(b) an extension primer which comprises a 3' portion which is 

complementary to a portion of the region of double stranded DNA and a 
5' portion which is not complementary to the region of double stranded 
DNA, wherein the extension primer terminates one nucleotide 5' to the 
polymorphic locus. 

A kit comprising in a single container two or more of the sets of primers of 

claim 20. 



22. A kit comprising in a single container: 
(a) a set of primers of claim 20; and 

l0 (b) a solid support comprising a probe which is attached to a solid support, 

wherein the probe is complementary to the 5' portion of the extension 
primer. 

23. The kit of claim 22 wherein the solid support is an oligonucleotide array. 

24. The kit of claim 22 wherein the solid support is a bead. 

15 25. The kit of claim 22 wherein the solid support is a microliter plate. 

26. A method to aid in determining a ratio of alleles at a polymorphic locus in a 
sample, comprising the steps of: 

(a) labeling an extension primer by a single base extension reaction to form 
a labeled extension primer, using a DNA molecule as a template, 
wherein the extension primer comprises a 3' portion and a 5' portion, 
wherein the 3 1 portion is complementary to the DNA molecule and 
terminates one nucleotide 5' to a polymorphic locus, wherein the 5' 
portion is not complementary to the DNA molecule, whereby a labeled 
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dideoxynucleotide which is complementary to the polymorphic locus is 
coupled to the 3' end of the extension primer, wherein each type of 
dideoxynucleotide present in the reaction bears a distinct label; and 
(b) hybridizing the 5' portion of the extension primer to one or more probes 
complementary to the 5' portion which are immobilized to known 
locations on a solid support. 

The method of claim 26 wherein two complementary strands of the DNA 
molecule are present in the single base extension reaction. 

The method of claim 27 wherein each complementary strand of the DNA 
molecule is used as a template to label an extension primer. 



29. The method of claim 26 wherein tfc ■« label is a fluorescent label. 

30. The method of claim 26 wherein the label is a radiolabel. 

31 . The method of claim 26 wherein the label is an enzyme label. 

32. The method of claim 26 wherein the label is an antigenic label. 

15 33. The method of claim 26 wherein the label is an affinity binding partner. 

34. The method of claim 26 further comprising the step of: 

(c) optically detecting a fluorescent label on the solid support. 

35. The method of claim 26 further comprising the steps of: 

(c) comparing quantities of a first and a second label at a location on the 
20 solid support; and 
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(d) determining the ratio of nucleotides present at the polymorphic locus in 
the sample. 

36. The method of claim 35 wherein the ratio of nucleotides present at two or more 
polymorphic loci is determined simultaneously. 

37. The method of claim 26 wherein the sample comprises DNA from two or more 
individuals. 

38. The method of claim 34 wherein the ratio of nucleotides present at two or more 
polymorphic loci is determined simultaneously. 

39. The method of claim 26 wherein the step of labeling employs at least two 
distinct dideoxynucleotides bearing distinct labels. 



40. 



The method of claim 26 wherein the step of labeling employs four distinct 
dideoxynucleotides bearing distinct labels. 
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